Project Nature and Content - Computer Vision:
Sample codes are in this github address: https://github.com/KartikNW/MSDS_458_Public.Github 101: GitHub101.pdf https://canvas.northwestern.edu/courses/221322/files/19844794/download?wrap=1
Think of the first assignment as serving multiple purposes: (1) exploring neural nets/seeing how they work on a very simple problem, (2) examining alternative neural net structures with a simple, single-hidden layer network, and (3) learning how to fit a neural network directly in Python (or Scikit Learn, TensorFlow, or Keras). This first assignment gives you a choice as to which of these objectives to emphasize.
Bottom line. You may choose the vision data set that you will be looking at, assuming that it is a simple alphabetic or numeric data classification problem. And you may choose the Python coding framework that you use to build the neural net. The neural network should be a fully connected (dense) neural network with a single hidden layer.
This first assignment deals with neural networks for classification of images. The structure of the network should be simple, with only one internal/hidden layer. The intent of the assignment is to give you hands-on, practical experience with not only designing, training, and assessing a neural network, and interpreting the impact of hyperparameters, but to go one step further.
Regarding exploration, the goal is to understand how the neurons/nodes in a simple single-hidden layer network have learned to represent features within the input data.
Regarding the management problem for this assignment. Suppose you are asked to develop a neural network model for digit classification. How would you go about training such a model? How would you judge the model's accuracy in digit classification with real data examples, such as customer or client handwritten digits on paper?
You will do this exclusively using the backpropagation learning method. You will have gathered and preprocessed your data, designed and refined your network structure, trained and tested the network, varied the hyperparameters to improve performance and analyzed/assessed the results.
The most important thing is not just to give a summary of classification rates/errors. I trust that you can get a working classifier, or can train a network to do any useful task.
The important things are to identify - for each different class of input data - what it is that the hidden nodes are responding to.
You may use MNIST data for the first assignment. You can train and test a classifier on this data. But the core challenge is still to figure out what it is that the hidden nodes are responding to, and making the task more complex will not change this as the core focus. You need to conduct a minimum of the following 5 experiments for this data, in order to get some useful insights. You are welcome to conduct more experiments.
EXPERIMENT 1: Our dense neural network will consist of 784 input nodes, a hidden layer with 1 node and 10 output nodes (corresponding to the 10 digits). We use mnist.load_data() to get the 70,000 images divided into a set of 60,000 training images and 10,000 test images. We hold back 5,000 of the 60,000 training images for validation. After training the model, we group the 60,000 activation values of the hidden node for the (original) set of training images by the 10 predicted classes and visualize these sets of values using a boxplot. We expect the overlap between the range of values in the "boxes" to be minimal. In addition, we find the pattern that maximally activates the hidden node as a "warm up" exercise for similar analysis we will perform on CNN models in Assignment 2.
EXPERIMENT 2: This time our dense neural network will have 784 input nodes, a hidden layer with 2 nodes and 10 output nodes (corresponding to the 10 digits). For each of the 60,000 images, the output of the two hidden nodes are plotted using a scatterplot. We color code the points according to which of the 10 classes the the output of the two nodes predicts. Ideally, just like in EXPERIMENT 1, the color clusters should have very little overlap. Also compare the accuracy % & confusion matrix of Experiments 1 & 2. Again, the goal is to get more insights.
EXPERIMENT 3: You can explore with more hidden nodes. Then end up with 1 ‘final’ model. Say the ‘best’ model.
EXPERIMENT 4: Use PCA decomposition to reduce the number of dimensions of our training set of 28x28 dimensional MNIST images from 784 to 154 (with 95% of training images variance lying along these components). We also reduce the number of dimensions of 'best' model from Experiment 3 to 154 inputs nodes and train it on the new lower dimensional data. We then compare the performance of Experiments 3 and 4.
EXPERIMENT 5: We use a Random Forest classifier to get the relative importance of the 784 features (pixels) of the 28x28 dimensional images in training set of MNIST images and select the top 70 features (pixels). We train our 'best' dense neural network using these 70 features and compare its performance to the the dense neural network models from EXPERIMENTS 3 and 4.
from IPython.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))
Importing Packages¶
First we import all the packages that will be used in the assignment.
Since Keras is integrated in TensorFlow 2.x, we import
kerasfromtensorflowand usetenserflow.keras.xxxto import all other Keras packages. The seed argument produces a deterministic sequence of tensors across multiple calls.
import datetime
from packaging import version
from collections import Counter
import numpy as np
import pandas as pd
import random
import matplotlib as mpl # EA
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import mean_squared_error as MSE
from sklearn.metrics import accuracy_score
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow import keras
from tensorflow.keras import models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import mnist
import tensorflow.keras.backend as k
from tensorflow.python.client import device_lib
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
np.set_printoptions(precision=3, suppress=True)
Verify TensorFlow version¶
print("This notebook requires TensorFlow 2.0 or above")
print("TensorFlow version: ", tf.__version__)
assert version.parse(tf.__version__).release[0] >=2
This notebook requires TensorFlow 2.0 or above TensorFlow version: 2.12.0
Mount Google Drive to Colab environment¶
#from google.colab import drive
#drive.mount('/content/gdrive')
Research Assignment Reporting Functions¶
def print_validation_report(test_labels, predictions):
print("Classification Report")
print(classification_report(test_labels, predictions))
print('Accuracy Score: {}'.format(accuracy_score(test_labels, predictions)))
print('Root Mean Square Error: {}'.format(np.sqrt(MSE(test_labels, predictions))))
def plot_confusion_matrix(y_true, y_pred):
mtx = confusion_matrix(y_true, y_pred)
fig, ax = plt.subplots(figsize=(16,12))
sns.heatmap(mtx, annot=True, fmt='d', linewidths=.75, cbar=False, ax=ax,cmap='Blues',linecolor='white')
# square=True,
plt.ylabel('true label')
plt.xlabel('predicted label')
return mtx
def plot_history(history):
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
plt.figure(figsize=(16, 4))
for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
plt.subplot(1, 2, i + 1)
plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
plt.legend()
plt.show()
def plot_digits(instances, pos, images_per_row=5, **options):
size = 28
images_per_row = min(len(instances), images_per_row)
images = [instance.reshape(size,size) for instance in instances]
n_rows = (len(instances) - 1) // images_per_row + 1
row_images = []
n_empty = n_rows * images_per_row - len(instances)
images.append(np.zeros((size, size * n_empty)))
for row in range(n_rows):
rimages = images[row * images_per_row : (row + 1) * images_per_row]
row_images.append(np.concatenate(rimages, axis=1))
image = np.concatenate(row_images, axis=0)
pos.imshow(image, cmap = 'binary', **options)
pos.axis("off")
def plot_digit(data):
image = data.reshape(28, 28)
plt.imshow(image, cmap = 'hot',
interpolation="nearest")
plt.axis("off")
def display_training_curves(training, validation, title, subplot):
ax = plt.subplot(subplot)
ax.plot(training)
ax.plot(validation)
ax.set_title('model '+ title)
ax.set_ylabel(title)
ax.set_xlabel('epoch')
ax.legend(['training', 'validation'])
seed_val = 43
np.random.seed(seed_val)
random.seed(seed_val)
tf.random.set_seed(seed_val)
Loading MNIST Dataset¶
- The MNIST dataset of handwritten digits has a training set of 60,000 images, and a test set of 10,000 images. It comes prepackaged as part of
tf.Keras. Use thetf.keras.datasets.mnist.load_datato the get these datasets (and the corresponding labels) as Numpy arrays.
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
- Tuples of Numpy arrays:
(x_train, y_train),(x_test, y_test) x_train,x_test: uint8 arrays of grayscale image data with shapes (num_samples, 28, 28).y_train,y_test: uint8 arrays of digit labels (integers in range 0-9)
EDA Training and Test Sets¶
- Inspect the training and test sets as well as their labels as follows.
print('x_train:\t{}'.format(train_images.shape))
print('y_train:\t{}'.format(train_labels.shape))
print('x_test:\t\t{}'.format(test_images.shape))
print('y_test:\t\t{}'.format(test_labels.shape))
x_train: (60000, 28, 28) y_train: (60000,) x_test: (10000, 28, 28) y_test: (10000,)
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255
Validation Dataset¶
- Create validation set from training set: 5000 images
val_images, train_images = train_images[:5000], train_images[5000:]
val_labels, train_labels = train_labels[:5000], train_labels[5000:]
Review labels for training set¶
print("First ten labels training dataset:\n {}\n".format(train_labels[0:10]))
First ten labels training dataset: [7 3 4 6 1 8 1 0 9 8]
Find frequency of each label in training and test sets¶
# reload as we have removed 5000 for validation
(train_images_dist, train_labels_dist), (test_images_dist, test_labels_dist) = mnist.load_data()
plt.figure(figsize = (12 ,8))
items = [{'Class': x, 'Count': y} for x, y in Counter(train_labels_dist).items()]
distribution = pd.DataFrame(items).sort_values(['Class'])
sns.barplot(x=distribution.Class, y=distribution.Count);
Counter(train_labels_dist).most_common()
[(1, 6742), (7, 6265), (3, 6131), (2, 5958), (9, 5949), (0, 5923), (6, 5918), (8, 5851), (4, 5842), (5, 5421)]
Counter(test_labels_dist).most_common()
[(1, 1135), (2, 1032), (7, 1028), (3, 1010), (9, 1009), (4, 982), (0, 980), (8, 974), (6, 958), (5, 892)]
Plot sample images with their labels¶
fig = plt.figure(figsize = (15, 9))
for i in range(50):
plt.subplot(5, 10, 1+i)
plt.title(train_labels_dist[i])
plt.xticks([])
plt.yticks([])
plt.imshow(train_images_dist[i].reshape(28,28), cmap='binary')
np.set_printoptions(linewidth=np.inf)
print("{}".format(train_images_dist[2020]))
[[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 167 208 19 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 13 235 254 99 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 74 254 234 4 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 154 254 145 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 224 254 92 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 51 245 211 13 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 2 169 254 101 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 27 254 254 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 72 255 241 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 88 254 153 0 0 33 53 155 156 102 15 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 130 254 31 0 128 235 254 254 254 254 186 10 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 190 254 51 178 254 246 213 111 109 186 254 145 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 192 254 229 254 216 90 0 0 0 57 254 234 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 235 254 254 247 85 0 0 0 0 32 254 234 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 235 254 254 118 0 0 0 0 0 107 254 201 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 235 255 254 102 12 0 0 0 8 188 248 119 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 207 254 254 238 107 0 0 39 175 254 148 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 84 254 248 74 11 32 115 238 254 176 11 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 21 214 254 254 254 254 254 254 132 6 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 14 96 176 254 254 214 48 12 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
Creating the DNN Model¶
- In this step, we first choose the network architecture for the model. Then we build.compile, train and evaulate the model.
Build the DNN model¶
We use a Sequential class defined in Keras to create our model. All the layers are going to be Dense layers. This means, like the figure shown above, all the nodes of a layer would be connected to all the nodes of the preceding layer i.e. densely connected.
After the model is built, we view ....
Experiment 1¶
- 784 Input Nodes
- hidden layer: 1 node
- output layer: 10 nodes
# k.clear_session()
model = Sequential([
Dense(name = 'hidden_layer_1', units=1, activation='relu', input_shape=[784]),
Dense(name = 'output_layer', units = 10, activation ='softmax')
])
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
hidden_layer_1 (Dense) (None, 1) 785
output_layer (Dense) (None, 10) 20
=================================================================
Total params: 805
Trainable params: 805
Non-trainable params: 0
_________________________________________________________________
keras.utils.plot_model(model, "mnist_model.png", show_shapes=True)
Compile the DNN model¶
In addition to setting up our model architecture, we also need to define which algorithm should the model use in order to optimize the weights and biases as per the given data. We will use stochastic gradient descent.
We also need to define a loss function. Think of this function as the difference between the predicted outputs and the actual outputs given in the dataset. This loss needs to be minimized in order to have a higher model accuracy. That's what the optimization algorithm essentially does - it minimizes the loss during model training. For our multi-class classification problem, categorical cross entropy is commonly used.
Finally, we will use the accuracy during training as a metric to keep track of as the model trains.
model.compile(optimizer='rmsprop',
loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])
Train the DNN model¶
tf.keras.model.fit
https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit
history = model.fit( train_images
, train_labels
, epochs=30
, validation_data=(val_images, val_labels)
, callbacks=[tf.keras.callbacks.ModelCheckpoint("exp_1_optimized.h5",save_best_only=True,save_weights_only=False)]
)
Epoch 1/30 154/1719 [=>............................] - ETA: 0s - loss: 2.1843 - accuracy: 0.1613
2024-10-06 23:58:08.350857: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
1719/1719 [==============================] - 1s 352us/step - loss: 1.9725 - accuracy: 0.2158 - val_loss: 1.8652 - val_accuracy: 0.2596 Epoch 2/30 1719/1719 [==============================] - 1s 294us/step - loss: 1.7900 - accuracy: 0.2817 - val_loss: 1.7219 - val_accuracy: 0.2974 Epoch 3/30 1719/1719 [==============================] - 1s 293us/step - loss: 1.6930 - accuracy: 0.2992 - val_loss: 1.6711 - val_accuracy: 0.2974 Epoch 4/30 1719/1719 [==============================] - 1s 296us/step - loss: 1.6588 - accuracy: 0.3037 - val_loss: 1.6456 - val_accuracy: 0.3006 Epoch 5/30 1719/1719 [==============================] - 0s 289us/step - loss: 1.6414 - accuracy: 0.3058 - val_loss: 1.6329 - val_accuracy: 0.3008 Epoch 6/30 1719/1719 [==============================] - 0s 290us/step - loss: 1.6302 - accuracy: 0.3091 - val_loss: 1.6230 - val_accuracy: 0.3024 Epoch 7/30 1719/1719 [==============================] - 0s 289us/step - loss: 1.6219 - accuracy: 0.3111 - val_loss: 1.6216 - val_accuracy: 0.3054 Epoch 8/30 1719/1719 [==============================] - 1s 295us/step - loss: 1.6156 - accuracy: 0.3131 - val_loss: 1.6163 - val_accuracy: 0.2998 Epoch 9/30 1719/1719 [==============================] - 1s 294us/step - loss: 1.6107 - accuracy: 0.3172 - val_loss: 1.6093 - val_accuracy: 0.3336 Epoch 10/30 1719/1719 [==============================] - 1s 298us/step - loss: 1.6055 - accuracy: 0.3262 - val_loss: 1.6089 - val_accuracy: 0.3396 Epoch 11/30 1719/1719 [==============================] - 1s 294us/step - loss: 1.6012 - accuracy: 0.3371 - val_loss: 1.6012 - val_accuracy: 0.3458 Epoch 12/30 1719/1719 [==============================] - 1s 293us/step - loss: 1.5968 - accuracy: 0.3450 - val_loss: 1.5939 - val_accuracy: 0.3520 Epoch 13/30 1719/1719 [==============================] - 1s 336us/step - loss: 1.5913 - accuracy: 0.3499 - val_loss: 1.5872 - val_accuracy: 0.3648 Epoch 14/30 1719/1719 [==============================] - 1s 295us/step - loss: 1.5856 - accuracy: 0.3554 - val_loss: 1.5802 - val_accuracy: 0.3684 Epoch 15/30 1719/1719 [==============================] - 1s 292us/step - loss: 1.5814 - accuracy: 0.3578 - val_loss: 1.5831 - val_accuracy: 0.3606 Epoch 16/30 1719/1719 [==============================] - 1s 294us/step - loss: 1.5784 - accuracy: 0.3550 - val_loss: 1.5746 - val_accuracy: 0.3712 Epoch 17/30 1719/1719 [==============================] - 1s 293us/step - loss: 1.5751 - accuracy: 0.3589 - val_loss: 1.5716 - val_accuracy: 0.3774 Epoch 18/30 1719/1719 [==============================] - 1s 293us/step - loss: 1.5733 - accuracy: 0.3597 - val_loss: 1.5781 - val_accuracy: 0.3848 Epoch 19/30 1719/1719 [==============================] - 1s 292us/step - loss: 1.5713 - accuracy: 0.3608 - val_loss: 1.5694 - val_accuracy: 0.3722 Epoch 20/30 1719/1719 [==============================] - 0s 290us/step - loss: 1.5699 - accuracy: 0.3638 - val_loss: 1.5667 - val_accuracy: 0.3758 Epoch 21/30 1719/1719 [==============================] - 1s 294us/step - loss: 1.5690 - accuracy: 0.3641 - val_loss: 1.5665 - val_accuracy: 0.3886 Epoch 22/30 1719/1719 [==============================] - 1s 291us/step - loss: 1.5675 - accuracy: 0.3681 - val_loss: 1.5669 - val_accuracy: 0.3852 Epoch 23/30 1719/1719 [==============================] - 1s 294us/step - loss: 1.5659 - accuracy: 0.3733 - val_loss: 1.5640 - val_accuracy: 0.3926 Epoch 24/30 1719/1719 [==============================] - 1s 292us/step - loss: 1.5630 - accuracy: 0.3817 - val_loss: 1.5590 - val_accuracy: 0.3988 Epoch 25/30 1719/1719 [==============================] - 1s 294us/step - loss: 1.5572 - accuracy: 0.3922 - val_loss: 1.5570 - val_accuracy: 0.4018 Epoch 26/30 1719/1719 [==============================] - 1s 291us/step - loss: 1.5491 - accuracy: 0.3905 - val_loss: 1.5549 - val_accuracy: 0.3924 Epoch 27/30 1719/1719 [==============================] - 1s 292us/step - loss: 1.5430 - accuracy: 0.3887 - val_loss: 1.5384 - val_accuracy: 0.3910 Epoch 28/30 1719/1719 [==============================] - 1s 292us/step - loss: 1.5388 - accuracy: 0.3871 - val_loss: 1.5353 - val_accuracy: 0.3860 Epoch 29/30 1719/1719 [==============================] - 1s 292us/step - loss: 1.5348 - accuracy: 0.3846 - val_loss: 1.5270 - val_accuracy: 0.3842 Epoch 30/30 1719/1719 [==============================] - 1s 291us/step - loss: 1.5322 - accuracy: 0.3840 - val_loss: 1.5261 - val_accuracy: 0.3870
model = tf.keras.models.load_model("exp_1_optimized.h5")
Evaluate the DNN model¶
In order to ensure that this is not a simple "memorization" by the machine, we should evaluate the performance on the test set. This is easy to do, we simply use the evaluate method on our model.
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f'test acc: {test_acc}, test loss: {test_loss}')
313/313 [==============================] - 0s 252us/step - loss: 1.5491 - accuracy: 0.3818 test acc: 0.38179999589920044, test loss: 1.5491249561309814
Plot performance metrics¶
We use Matplotlib to create 2 plots--displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.
history_dict = history.history
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.subplots(figsize=(16,12))
plt.tight_layout()
display_training_curves(history_dict['accuracy'], history_dict['val_accuracy'], 'accuracy', 211)
display_training_curves(history_dict['loss'], history_dict['val_loss'], 'loss', 212)
Making Predictions¶
# Get the predicted classes:
# model.predict_classes() is depricated in TensorFlow 2.7+
# pred_classes = model.predict(train_images)
# alternate method:
pred_train=model.predict(train_images)
pred_classes=np.argmax(pred_train, axis=1)
1719/1719 [==============================] - 0s 189us/step
print_validation_report(train_labels, pred_classes)
Classification Report
precision recall f1-score support
0 0.52 0.45 0.48 5444
1 0.43 0.88 0.58 6179
2 0.23 0.38 0.29 5470
3 0.27 0.53 0.36 5638
4 0.00 0.00 0.00 5307
5 0.00 0.00 0.00 4987
6 0.38 0.45 0.41 5417
7 0.55 0.86 0.67 5715
8 0.00 0.00 0.00 5389
9 0.35 0.16 0.22 5454
accuracy 0.38 55000
macro avg 0.27 0.37 0.30 55000
weighted avg 0.28 0.38 0.31 55000
Accuracy Score: 0.3846
Root Mean Square Error: 3.233677894792975
Create the confusion matrix¶
Let us see what the confusion matrix looks like. Using both sklearn.metrics. Then we visualize the confusion matrix and see what that tells us.
conf_mx = tf.math.confusion_matrix(train_labels, pred_classes)
conf_mx;
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
df = pd.DataFrame(pred_train[0:20], columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
df.style.format("{:.2%}").background_gradient(cmap=cm)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00% | 8.83% | 0.05% | 0.39% | 4.47% | 0.06% | 0.00% | 55.68% | 0.28% | 30.24% |
| 1 | 0.64% | 14.44% | 12.32% | 22.97% | 14.98% | 11.79% | 1.42% | 0.04% | 20.24% | 1.16% |
| 2 | 0.00% | 16.55% | 0.19% | 1.17% | 9.21% | 0.20% | 0.00% | 37.29% | 0.86% | 34.52% |
| 3 | 22.75% | 0.04% | 19.22% | 6.16% | 0.11% | 15.21% | 29.37% | 0.00% | 7.14% | 0.00% |
| 4 | 0.09% | 30.87% | 4.34% | 13.44% | 24.71% | 4.38% | 0.24% | 1.35% | 10.94% | 9.64% |
| 5 | 5.12% | 1.52% | 23.26% | 19.24% | 2.39% | 20.39% | 8.83% | 0.00% | 19.24% | 0.01% |
| 6 | 0.12% | 29.47% | 5.03% | 14.76% | 24.26% | 5.06% | 0.30% | 0.95% | 12.12% | 7.94% |
| 7 | 44.25% | 0.00% | 6.66% | 0.81% | 0.00% | 4.75% | 42.44% | 0.00% | 1.09% | 0.00% |
| 8 | 0.00% | 18.14% | 0.23% | 1.40% | 10.27% | 0.25% | 0.00% | 34.04% | 1.02% | 34.65% |
| 9 | 6.96% | 0.90% | 24.01% | 17.00% | 1.54% | 20.70% | 11.45% | 0.00% | 17.42% | 0.01% |
| 10 | 37.92% | 0.00% | 10.28% | 1.74% | 0.01% | 7.59% | 40.24% | 0.00% | 2.22% | 0.00% |
| 11 | 7.25% | 0.84% | 24.07% | 16.69% | 1.45% | 20.70% | 11.85% | 0.00% | 17.15% | 0.00% |
| 12 | 0.03% | 33.30% | 2.14% | 8.28% | 23.83% | 2.22% | 0.08% | 4.91% | 6.51% | 18.69% |
| 13 | 16.04% | 0.14% | 22.44% | 9.56% | 0.31% | 18.32% | 22.59% | 0.00% | 10.60% | 0.00% |
| 14 | 0.00% | 8.83% | 0.05% | 0.39% | 4.47% | 0.06% | 0.00% | 55.68% | 0.28% | 30.24% |
| 15 | 35.47% | 0.00% | 11.75% | 2.22% | 0.01% | 8.79% | 38.96% | 0.00% | 2.80% | 0.00% |
| 16 | 3.86% | 2.33% | 22.19% | 20.97% | 3.42% | 19.74% | 6.93% | 0.00% | 20.54% | 0.03% |
| 17 | 0.00% | 8.83% | 0.05% | 0.39% | 4.47% | 0.06% | 0.00% | 55.68% | 0.28% | 30.24% |
| 18 | 44.95% | 0.00% | 6.29% | 0.73% | 0.00% | 4.46% | 42.57% | 0.00% | 0.99% | 0.00% |
| 19 | 59.92% | 0.00% | 0.89% | 0.03% | 0.00% | 0.55% | 38.56% | 0.00% | 0.05% | 0.00% |
Visualize the confusion matrix¶
We use code from chapter 3 of Hands on Machine Learning (A. Geron) (cf. https://github.com/ageron/handson-ml2/blob/master/03_classification.ipynb) to display a "heat map" of the confusion matrix. Then we normalize the confusion matrix so we can compare error rates.
mtx = plot_confusion_matrix(train_labels,pred_classes)
Get Activation Values of the Hidden Nodes (128)¶
To get the activation values of the hidden nodes, we need to create a new model, activation_model, that takes the same input as our current model but outputs the activation value of the hidden layer, i.e. of the hidden node. Then use the predict function to get the activation values.
# Extracts the outputs of the 2 layers:
layer_outputs = [layer.output for layer in model.layers]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
print(f"There are {len(layer_outputs)} layers")
layer_outputs # description of the layers
There are 2 layers
[<KerasTensor: shape=(None, 1) dtype=float32 (created by layer 'hidden_layer_1')>, <KerasTensor: shape=(None, 10) dtype=float32 (created by layer 'output_layer')>]
# Get the output of the hidden node for each of the 55000 training images
activations = activation_model.predict(train_images)
hidden_layer_activation = activations[0]
hidden_layer_activation.shape # hidden node has one activation value per training image
1719/1719 [==============================] - 0s 217us/step
(55000, 1)
print(f"The maximum activation value of the hidden node is {hidden_layer_activation.max()}")
The maximum activation value of the hidden node is 14.317005157470703
# Some stats about the output layer as an aside...
np.set_printoptions(suppress = True) # display probabilities as decimals and NOT in scientific notation
ouput_layer_activation = activations[1]
print(f"The output node has shape {ouput_layer_activation.shape}")
print(f"The output for the first image are {ouput_layer_activation[0].round(4)}")
print(f"The sum of the probabilities is (approximately) {ouput_layer_activation[0].sum()}")
The output node has shape (55000, 10) The output for the first image are [0. 0.088 0.001 0.004 0.045 0.001 0. 0.557 0.003 0.302] The sum of the probabilities is (approximately) 1.0
boxplot_df = pd.DataFrame({'act_value':hidden_layer_activation.reshape(55000),
'pred_class':pred_classes})
boxplot_df.head()
| act_value | pred_class | |
|---|---|---|
| 0 | 0.000000 | 7 |
| 1 | 2.302789 | 3 |
| 2 | 0.304156 | 7 |
| 3 | 5.188391 | 6 |
| 4 | 1.470149 | 1 |
Visualize the activation values with boxplots¶
We get the activation values of the first hidden node and combine them with the corresponding class labels into a DataFrame. We use both matplotlib and seaborn to create boxplots from the dataframe.
# To see how closely the hidden nodes activation values correlate with the class predictions
# Note that there were no 5s detected and that there were outliers for the activation values for the 6s
boxplot_df[['act_value','pred_class']].boxplot(by ='pred_class', column =['act_value'], grid = True)
<Axes: title={'center': 'act_value'}, xlabel='pred_class'>
boxplot_df['pred_class'].value_counts() # Another way to verify what the boxplot is telling us
pred_class 1 12614 3 10871 7 8982 2 8968 6 6436 0 4676 9 2453 Name: count, dtype: int64
# Let us use seaborn for the boxplots this time.
plt.figure(figsize=(16,10))
bplot = sns.boxplot(y='act_value', x='pred_class',
data=boxplot_df,
width=0.5,
palette="colorblind")
cl_a, cl_b = 1, 9
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]
plt.figure(figsize=(8,8))
p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)
plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);
p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")
# plt.savefig("error_analysis_digits_plot_EXP1_valid")
plt.show()
Experiment 2¶
- 784 Input Nodes
- hidden layer: 2 nodes
- output layer: 10 nodes
# k.clear_session()
model = Sequential([
Dense(name = 'hidden_layer_1', units=2, activation='relu', input_shape=[784]),
Dense(name = 'output_layer', units = 10, activation ='softmax')
])
Build the DNN model¶
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
hidden_layer_1 (Dense) (None, 2) 1570
output_layer (Dense) (None, 10) 30
=================================================================
Total params: 1,600
Trainable params: 1,600
Non-trainable params: 0
_________________________________________________________________
keras.utils.plot_model(model, "mnist_model.png", show_shapes=True)
Compile the DNN model¶
model.compile(optimizer='rmsprop',
loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])
Train the DNN model¶
history = model.fit( train_images
, train_labels
, epochs=30
, validation_data=(val_images, val_labels)
, callbacks=[tf.keras.callbacks.ModelCheckpoint("exp_2_optimized.h5",save_best_only=True,save_weights_only=False)]
)
Epoch 1/30 1719/1719 [==============================] - 1s 333us/step - loss: 1.6624 - accuracy: 0.4288 - val_loss: 1.4240 - val_accuracy: 0.5120 Epoch 2/30 1719/1719 [==============================] - 1s 303us/step - loss: 1.3540 - accuracy: 0.5343 - val_loss: 1.2949 - val_accuracy: 0.5608 Epoch 3/30 1719/1719 [==============================] - 1s 304us/step - loss: 1.2780 - accuracy: 0.5690 - val_loss: 1.2404 - val_accuracy: 0.5916 Epoch 4/30 1719/1719 [==============================] - 1s 302us/step - loss: 1.2256 - accuracy: 0.6023 - val_loss: 1.1941 - val_accuracy: 0.6274 Epoch 5/30 1719/1719 [==============================] - 1s 302us/step - loss: 1.1894 - accuracy: 0.6173 - val_loss: 1.1670 - val_accuracy: 0.6338 Epoch 6/30 1719/1719 [==============================] - 1s 310us/step - loss: 1.1712 - accuracy: 0.6231 - val_loss: 1.1501 - val_accuracy: 0.6368 Epoch 7/30 1719/1719 [==============================] - 1s 304us/step - loss: 1.1608 - accuracy: 0.6248 - val_loss: 1.1494 - val_accuracy: 0.6334 Epoch 8/30 1719/1719 [==============================] - 1s 303us/step - loss: 1.1527 - accuracy: 0.6286 - val_loss: 1.1364 - val_accuracy: 0.6454 Epoch 9/30 1719/1719 [==============================] - 1s 305us/step - loss: 1.1463 - accuracy: 0.6318 - val_loss: 1.1326 - val_accuracy: 0.6502 Epoch 10/30 1719/1719 [==============================] - 1s 302us/step - loss: 1.1411 - accuracy: 0.6349 - val_loss: 1.1307 - val_accuracy: 0.6372 Epoch 11/30 1719/1719 [==============================] - 1s 300us/step - loss: 1.1361 - accuracy: 0.6394 - val_loss: 1.1234 - val_accuracy: 0.6432 Epoch 12/30 1719/1719 [==============================] - 1s 301us/step - loss: 1.1308 - accuracy: 0.6419 - val_loss: 1.1199 - val_accuracy: 0.6514 Epoch 13/30 1719/1719 [==============================] - 1s 302us/step - loss: 1.1245 - accuracy: 0.6431 - val_loss: 1.1118 - val_accuracy: 0.6590 Epoch 14/30 1719/1719 [==============================] - 1s 302us/step - loss: 1.1191 - accuracy: 0.6453 - val_loss: 1.1059 - val_accuracy: 0.6592 Epoch 15/30 1719/1719 [==============================] - 1s 302us/step - loss: 1.1145 - accuracy: 0.6466 - val_loss: 1.1011 - val_accuracy: 0.6664 Epoch 16/30 1719/1719 [==============================] - 1s 302us/step - loss: 1.1103 - accuracy: 0.6503 - val_loss: 1.0981 - val_accuracy: 0.6658 Epoch 17/30 1719/1719 [==============================] - 1s 301us/step - loss: 1.1060 - accuracy: 0.6510 - val_loss: 1.0925 - val_accuracy: 0.6716 Epoch 18/30 1719/1719 [==============================] - 1s 299us/step - loss: 1.1023 - accuracy: 0.6514 - val_loss: 1.0949 - val_accuracy: 0.6718 Epoch 19/30 1719/1719 [==============================] - 1s 301us/step - loss: 1.0991 - accuracy: 0.6541 - val_loss: 1.0878 - val_accuracy: 0.6758 Epoch 20/30 1719/1719 [==============================] - 1s 301us/step - loss: 1.0965 - accuracy: 0.6548 - val_loss: 1.0837 - val_accuracy: 0.6768 Epoch 21/30 1719/1719 [==============================] - 1s 301us/step - loss: 1.0939 - accuracy: 0.6565 - val_loss: 1.0833 - val_accuracy: 0.6738 Epoch 22/30 1719/1719 [==============================] - 1s 302us/step - loss: 1.0910 - accuracy: 0.6576 - val_loss: 1.0902 - val_accuracy: 0.6704 Epoch 23/30 1719/1719 [==============================] - 1s 303us/step - loss: 1.0897 - accuracy: 0.6583 - val_loss: 1.0780 - val_accuracy: 0.6806 Epoch 24/30 1719/1719 [==============================] - 1s 299us/step - loss: 1.0870 - accuracy: 0.6583 - val_loss: 1.0789 - val_accuracy: 0.6710 Epoch 25/30 1719/1719 [==============================] - 1s 302us/step - loss: 1.0848 - accuracy: 0.6573 - val_loss: 1.0755 - val_accuracy: 0.6714 Epoch 26/30 1719/1719 [==============================] - 1s 300us/step - loss: 1.0827 - accuracy: 0.6572 - val_loss: 1.0761 - val_accuracy: 0.6684 Epoch 27/30 1719/1719 [==============================] - 1s 304us/step - loss: 1.0806 - accuracy: 0.6588 - val_loss: 1.0692 - val_accuracy: 0.6678 Epoch 28/30 1719/1719 [==============================] - 1s 299us/step - loss: 1.0793 - accuracy: 0.6573 - val_loss: 1.0870 - val_accuracy: 0.6570 Epoch 29/30 1719/1719 [==============================] - 1s 302us/step - loss: 1.0762 - accuracy: 0.6571 - val_loss: 1.0618 - val_accuracy: 0.6736 Epoch 30/30 1719/1719 [==============================] - 1s 301us/step - loss: 1.0743 - accuracy: 0.6571 - val_loss: 1.0690 - val_accuracy: 0.6584
Evaluate the DNN model¶
model = tf.keras.models.load_model("exp_2_optimized.h5")
test_loss, test_acc = model.evaluate(test_images, test_labels)
313/313 [==============================] - 0s 262us/step - loss: 1.0713 - accuracy: 0.6571
print(f'test acc: {test_acc}, test loss: {test_loss}')
test acc: 0.6571000218391418, test loss: 1.0712623596191406
Plot performance metrics¶
history_dict = history.history
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.subplots(figsize=(16,12))
plt.tight_layout()
display_training_curves(history_dict['accuracy'], history_dict['val_accuracy'], 'accuracy', 211)
display_training_curves(history_dict['loss'], history_dict['val_loss'], 'loss', 212)
Making Predictions¶
# Get the predicted classes:
# model.predict_classes() is depricated in TensorFlow 2.7+
# pred_classes = model.predict(train_images)
# alternate method:
pred_train=model.predict(train_images)
pred_classes=np.argmax(pred_train, axis=1)
1719/1719 [==============================] - 0s 192us/step
print_validation_report(train_labels, pred_classes)
Classification Report
precision recall f1-score support
0 0.79 0.80 0.79 5444
1 0.85 0.92 0.88 6179
2 0.60 0.45 0.52 5470
3 0.68 0.70 0.69 5638
4 0.51 0.47 0.49 5307
5 0.70 0.64 0.67 4987
6 0.80 0.75 0.77 5417
7 0.69 0.82 0.75 5715
8 0.46 0.55 0.50 5389
9 0.47 0.44 0.46 5454
accuracy 0.66 55000
macro avg 0.66 0.65 0.65 55000
weighted avg 0.66 0.66 0.66 55000
Accuracy Score: 0.6596545454545455
Root Mean Square Error: 2.541706656416654
Create the confusion matrix¶
conf_mx = tf.math.confusion_matrix(train_labels, pred_classes)
conf_mx;
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
df = pd.DataFrame(pred_train[0:20], columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
df.style.format("{:.2%}").background_gradient(cmap=cm)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00% | 0.01% | 9.70% | 0.00% | 3.15% | 0.00% | 0.06% | 59.22% | 0.10% | 27.75% |
| 1 | 0.08% | 3.80% | 0.04% | 76.99% | 0.26% | 8.88% | 0.02% | 0.02% | 9.76% | 0.15% |
| 2 | 0.28% | 0.37% | 15.27% | 2.55% | 22.50% | 1.09% | 1.23% | 7.45% | 16.41% | 32.84% |
| 3 | 2.98% | 0.00% | 20.11% | 0.00% | 24.97% | 0.00% | 50.53% | 0.00% | 0.28% | 1.12% |
| 4 | 0.00% | 91.46% | 0.04% | 5.28% | 0.07% | 0.05% | 0.00% | 2.20% | 0.35% | 0.54% |
| 5 | 20.26% | 0.00% | 1.18% | 1.21% | 8.80% | 18.63% | 6.24% | 0.00% | 43.24% | 0.45% |
| 6 | 0.00% | 89.19% | 0.04% | 7.97% | 0.08% | 0.09% | 0.00% | 1.60% | 0.51% | 0.52% |
| 7 | 90.02% | 0.00% | 0.01% | 0.00% | 0.25% | 3.44% | 3.83% | 0.00% | 2.45% | 0.00% |
| 8 | 0.18% | 0.08% | 20.17% | 0.57% | 23.18% | 0.34% | 1.31% | 8.74% | 8.20% | 37.23% |
| 9 | 32.44% | 0.00% | 1.07% | 0.39% | 8.64% | 13.92% | 9.56% | 0.00% | 33.71% | 0.27% |
| 10 | 82.97% | 0.00% | 0.02% | 0.01% | 0.47% | 7.11% | 4.15% | 0.00% | 5.27% | 0.00% |
| 11 | 1.20% | 0.08% | 0.10% | 38.21% | 0.88% | 30.74% | 0.17% | 0.00% | 28.48% | 0.15% |
| 12 | 0.00% | 92.45% | 0.04% | 1.53% | 0.04% | 0.01% | 0.00% | 5.23% | 0.11% | 0.59% |
| 13 | 1.27% | 0.00% | 0.00% | 31.67% | 0.08% | 53.39% | 0.03% | 0.00% | 13.55% | 0.00% |
| 14 | 0.00% | 0.02% | 8.92% | 0.00% | 3.02% | 0.00% | 0.06% | 60.79% | 0.11% | 27.08% |
| 15 | 84.10% | 0.00% | 0.04% | 0.00% | 0.79% | 3.68% | 7.06% | 0.00% | 4.32% | 0.00% |
| 16 | 0.00% | 0.00% | 62.59% | 0.00% | 10.19% | 0.00% | 1.14% | 3.37% | 0.01% | 22.70% |
| 17 | 0.00% | 0.00% | 36.39% | 0.00% | 7.54% | 0.00% | 0.35% | 20.50% | 0.03% | 35.19% |
| 18 | 33.77% | 0.00% | 0.48% | 0.00% | 2.43% | 0.00% | 63.20% | 0.00% | 0.10% | 0.00% |
| 19 | 93.83% | 0.00% | 0.00% | 0.00% | 0.14% | 0.97% | 4.24% | 0.00% | 0.82% | 0.00% |
Visualize the confusion matrix¶
mtx = plot_confusion_matrix(train_labels,pred_classes)
cl_a, cl_b = 2, 3
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]
plt.figure(figsize=(8,8))
p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)
plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);
p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")
# plt.savefig("error_analysis_digits_plot_EXP1_valid")
plt.show()
Get Activation Values of the Hidden Nodes (128)¶
# Extracts the outputs of the 2 layers:
layer_outputs = [layer.output for layer in model.layers]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
print(f"There are {len(layer_outputs)} layers")
layer_outputs # description of the layers
There are 2 layers
[<KerasTensor: shape=(None, 2) dtype=float32 (created by layer 'hidden_layer_1')>, <KerasTensor: shape=(None, 10) dtype=float32 (created by layer 'output_layer')>]
# Get the output of the hidden node for each of the 55000 training images
activations = activation_model.predict(train_images)
hidden_layer_activation = activations[0]
hidden_layer_activation.shape # hidden node has one activation value per training image
1719/1719 [==============================] - 0s 219us/step
(55000, 2)
hidden_node1_activation = hidden_layer_activation[:,0] # get activation values of the first hidden node
hidden_node2_activation = hidden_layer_activation[:,1] # get activation values of the second hidden node
print(f"The maximum activation value of the first hidden node is {hidden_node1_activation.max()}")
print(f"The maximum activation value of the second hidden node is {hidden_node2_activation.max()}")
The maximum activation value of the first hidden node is 22.321428298950195 The maximum activation value of the second hidden node is 75.70663452148438
# Some stats about the output layer as an aside...
np.set_printoptions(suppress = True) # display probabilities as decimals and NOT in scientific notation
ouput_layer_activation = activations[1]
print(f"The output node has shape {ouput_layer_activation.shape}")
print(f"The output for the first image are {ouput_layer_activation[0].round(4)}")
print(f"The sum of the probabilities is (approximately) {ouput_layer_activation[0].sum()}")
The output node has shape (55000, 10) The output for the first image are [0. 0. 0.097 0. 0.032 0. 0.001 0.592 0.001 0.278] The sum of the probabilities is (approximately) 1.0
scatterPlot_df = pd.DataFrame({'act_value_h1':hidden_node1_activation,
'act_value_h2':hidden_node2_activation,
'pred_class':pred_classes})
scatterPlot_df.head()
| act_value_h1 | act_value_h2 | pred_class | |
|---|---|---|---|
| 0 | 0.000000 | 5.461303 | 7 |
| 1 | 4.111321 | 0.957018 | 3 |
| 2 | 2.540297 | 4.186171 | 9 |
| 3 | 6.147233 | 12.957750 | 6 |
| 4 | 0.769478 | 0.000000 | 1 |
# To see how closely the hidden nodes activation values correlate with the class predictions
# Note that there were no 5s detected and that there were outliers for the activation values for the 6s
boxplot_df[['act_value','pred_class']].boxplot(by ='pred_class', column =['act_value'], grid = True)
<Axes: title={'center': 'act_value'}, xlabel='pred_class'>
#plt.legend(loc='upper left', prop={'size':6}, bbox_to_anchor=(1,1),ncol=1)
plt.scatter(scatterPlot_df.act_value_h1,
scatterPlot_df.act_value_h2,
c=scatterPlot_df.pred_class,
label=scatterPlot_df.pred_class)
plt.show()
# To see how closely the hidden node activation values correlate with the class labels
# Let us use seaborn for the boxplots this time.
plt.figure(figsize=(16,10))
# Let us use seaborn for the boxplots this time.
bplot = sns.boxplot(y='act_value', x='pred_class',
data=boxplot_df,
width=0.5,
palette="colorblind")
groups = scatterPlot_df.groupby('pred_class')
# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
ax.plot(group.act_value_h1, group.act_value_h2, marker='o', linestyle='', ms=12, label=name)
ax.legend()
plt.show()
Experiment 3¶
- 784 Input Nodes
- hidden layer: 128 nodes
- output layer: 10 nodes
# k.clear_session()
model = Sequential([
Dense(name = 'hidden_layer_1', units=128, activation='relu', input_shape=[784]),
Dense(name = 'output_layer', units = 10, activation ='softmax')
])
Build the DNN model¶
model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
hidden_layer_1 (Dense) (None, 128) 100480
output_layer (Dense) (None, 10) 1290
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
keras.utils.plot_model(model, "mnist_model.png", show_shapes=True)
Compile the DNN model¶
model.compile(optimizer='rmsprop',
loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])
Train the DNN model¶
history = model.fit(train_images
, train_labels
, epochs=20
, validation_data=(val_images, val_labels)
, callbacks=[tf.keras.callbacks.ModelCheckpoint("exp_3_optimized.h5",save_best_only=True,save_weights_only=False)]
)
Epoch 1/20 1719/1719 [==============================] - 1s 605us/step - loss: 0.2735 - accuracy: 0.9216 - val_loss: 0.1381 - val_accuracy: 0.9632 Epoch 2/20 1719/1719 [==============================] - 1s 584us/step - loss: 0.1292 - accuracy: 0.9617 - val_loss: 0.1016 - val_accuracy: 0.9712 Epoch 3/20 1719/1719 [==============================] - 1s 585us/step - loss: 0.0939 - accuracy: 0.9725 - val_loss: 0.1014 - val_accuracy: 0.9706 Epoch 4/20 1719/1719 [==============================] - 1s 586us/step - loss: 0.0738 - accuracy: 0.9786 - val_loss: 0.0811 - val_accuracy: 0.9776 Epoch 5/20 1719/1719 [==============================] - 1s 585us/step - loss: 0.0637 - accuracy: 0.9821 - val_loss: 0.0842 - val_accuracy: 0.9772 Epoch 6/20 1719/1719 [==============================] - 1s 584us/step - loss: 0.0537 - accuracy: 0.9850 - val_loss: 0.0870 - val_accuracy: 0.9758 Epoch 7/20 1719/1719 [==============================] - 1s 583us/step - loss: 0.0469 - accuracy: 0.9870 - val_loss: 0.0868 - val_accuracy: 0.9762 Epoch 8/20 1719/1719 [==============================] - 1s 582us/step - loss: 0.0408 - accuracy: 0.9893 - val_loss: 0.0856 - val_accuracy: 0.9774 Epoch 9/20 1719/1719 [==============================] - 1s 583us/step - loss: 0.0364 - accuracy: 0.9901 - val_loss: 0.0926 - val_accuracy: 0.9776 Epoch 10/20 1719/1719 [==============================] - 1s 591us/step - loss: 0.0326 - accuracy: 0.9915 - val_loss: 0.0902 - val_accuracy: 0.9790 Epoch 11/20 1719/1719 [==============================] - 1s 590us/step - loss: 0.0286 - accuracy: 0.9923 - val_loss: 0.1014 - val_accuracy: 0.9762 Epoch 12/20 1719/1719 [==============================] - 1s 588us/step - loss: 0.0255 - accuracy: 0.9933 - val_loss: 0.1036 - val_accuracy: 0.9786 Epoch 13/20 1719/1719 [==============================] - 1s 582us/step - loss: 0.0233 - accuracy: 0.9942 - val_loss: 0.1133 - val_accuracy: 0.9772 Epoch 14/20 1719/1719 [==============================] - 1s 588us/step - loss: 0.0204 - accuracy: 0.9947 - val_loss: 0.1043 - val_accuracy: 0.9772 Epoch 15/20 1719/1719 [==============================] - 1s 586us/step - loss: 0.0191 - accuracy: 0.9954 - val_loss: 0.1082 - val_accuracy: 0.9762 Epoch 16/20 1719/1719 [==============================] - 1s 582us/step - loss: 0.0166 - accuracy: 0.9959 - val_loss: 0.1037 - val_accuracy: 0.9796 Epoch 17/20 1719/1719 [==============================] - 1s 584us/step - loss: 0.0152 - accuracy: 0.9964 - val_loss: 0.1062 - val_accuracy: 0.9792 Epoch 18/20 1719/1719 [==============================] - 1s 597us/step - loss: 0.0139 - accuracy: 0.9966 - val_loss: 0.1247 - val_accuracy: 0.9760 Epoch 19/20 1719/1719 [==============================] - 1s 594us/step - loss: 0.0123 - accuracy: 0.9971 - val_loss: 0.1218 - val_accuracy: 0.9770 Epoch 20/20 1719/1719 [==============================] - 1s 595us/step - loss: 0.0109 - accuracy: 0.9973 - val_loss: 0.1282 - val_accuracy: 0.9770
model = tf.keras.models.load_model("exp_3_optimized.h5")
Evaluate the DNN model¶
test_loss, test_acc = model.evaluate(test_images, test_labels)
313/313 [==============================] - 0s 332us/step - loss: 0.0873 - accuracy: 0.9753
print(f'test acc: {test_acc}, test loss: {test_loss}')
test acc: 0.9753000140190125, test loss: 0.08734780550003052
Plot performance metrics¶
history_dict = history.history
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.subplots(figsize=(16,12))
plt.tight_layout()
display_training_curves(history_dict['accuracy'], history_dict['val_accuracy'], 'accuracy', 211)
display_training_curves(history_dict['loss'], history_dict['val_loss'], 'loss', 212)
Making Predictions¶
# Get the predicted classes:
# model.predict_classes() is depricated in TensorFlow 2.7+
# pred_classes = model.predict(train_images)
# alternate method:
pred_train=model.predict(train_images)
pred_classes=np.argmax(pred_train, axis=1)
1719/1719 [==============================] - 0s 270us/step
print_validation_report(train_labels, pred_classes)
Classification Report
precision recall f1-score support
0 0.99 1.00 0.99 5444
1 0.99 0.99 0.99 6179
2 0.98 0.99 0.98 5470
3 0.98 0.98 0.98 5638
4 0.98 0.99 0.99 5307
5 0.99 0.98 0.98 4987
6 0.99 0.99 0.99 5417
7 0.99 0.99 0.99 5715
8 0.98 0.98 0.98 5389
9 0.98 0.98 0.98 5454
accuracy 0.99 55000
macro avg 0.99 0.99 0.99 55000
weighted avg 0.99 0.99 0.99 55000
Accuracy Score: 0.9862
Root Mean Square Error: 0.5036051844631684
Create the confusion matrix¶
conf_mx = tf.math.confusion_matrix(train_labels, pred_classes)
conf_mx;
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
df = pd.DataFrame(pred_train[0:20], columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
df.style.format("{:.2%}").background_gradient(cmap=cm)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00% | 0.00% | 0.11% | 0.93% | 0.00% | 0.00% | 0.00% | 98.96% | 0.00% | 0.00% |
| 1 | 0.00% | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 2 | 0.00% | 0.00% | 0.00% | 0.01% | 70.95% | 0.00% | 0.00% | 0.01% | 0.02% | 29.02% |
| 3 | 0.04% | 0.00% | 0.26% | 0.00% | 0.04% | 0.06% | 99.62% | 0.00% | 0.00% | 0.00% |
| 4 | 0.00% | 99.93% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.01% | 0.05% | 0.00% |
| 5 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 100.00% | 0.00% |
| 6 | 0.00% | 99.87% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.12% | 0.00% |
| 7 | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 8 | 0.00% | 0.00% | 0.00% | 0.03% | 0.08% | 0.00% | 0.00% | 0.00% | 0.00% | 99.88% |
| 9 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 99.99% | 0.00% |
| 10 | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 11 | 0.00% | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 12 | 0.00% | 99.97% | 0.00% | 0.01% | 0.00% | 0.00% | 0.00% | 0.00% | 0.02% | 0.00% |
| 13 | 0.05% | 0.00% | 44.22% | 39.71% | 0.00% | 15.63% | 0.00% | 0.35% | 0.01% | 0.02% |
| 14 | 0.00% | 0.00% | 0.00% | 0.03% | 0.00% | 0.00% | 0.00% | 99.97% | 0.00% | 0.00% |
| 15 | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 16 | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 17 | 0.00% | 0.00% | 0.00% | 0.08% | 0.00% | 0.00% | 0.00% | 0.00% | 0.01% | 99.91% |
| 18 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% |
| 19 | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
Visualize the confusion matrix¶
mtx = plot_confusion_matrix(train_labels,pred_classes)
Most problematic classifications (actual, predicted):
- 7, 9
- 4, 9
- 5, 6
cl_a, cl_b = 7, 9
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]
plt.figure(figsize=(8,8))
p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)
plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);
p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")
# plt.savefig("error_analysis_digits_plot_EXP1_valid")
plt.show()
cl_a, cl_b = 4, 9
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]
plt.figure(figsize=(8,8))
p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)
plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);
p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")
# plt.savefig("error_analysis_digits_plot_EXP1_valid")
plt.show()
cl_a, cl_b = 5, 6
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]
plt.figure(figsize=(8,8))
p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)
plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);
p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")
# plt.savefig("error_analysis_digits_plot_EXP1_valid")
plt.show()
Get Activation Values of the Hidden Nodes (128)¶
# Extracts the outputs of the 2 layers:
layer_outputs = [layer.output for layer in model.layers]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
print(f"There are {len(layer_outputs)} layers")
layer_outputs; # description of the layers
There are 2 layers
# Get the outputs of all the hidden nodes for each of the 60000 training images
activations = activation_model.predict(train_images)
hidden_layer_activation = activations[0]
output_layer_activations = activations[1]
hidden_layer_activation.shape # each of the 128 hidden nodes has one activation value per training image
1719/1719 [==============================] - 1s 348us/step
(55000, 128)
output_layer_activations.shape
(55000, 10)
print(f"The maximum activation value of the hidden nodes in the hidden layer is \
{hidden_layer_activation.max()}")
The maximum activation value of the hidden nodes in the hidden layer is 16.965179443359375
# Some stats about the output layer as an aside...
np.set_printoptions(suppress = True) # display probabilities as decimals and NOT in scientific notation
ouput_layer_activation = activations[1]
print(f"The output node has shape {ouput_layer_activation.shape}")
print(f"The output for the first image are {ouput_layer_activation[0].round(4)}")
print(f"The sum of the probabilities is (approximately) {ouput_layer_activation[0].sum()}")
The output node has shape (55000, 10) The output for the first image are [0. 0. 0.001 0.009 0. 0. 0. 0.99 0. 0. ] The sum of the probabilities is (approximately) 1.0
Create a dataframe with the activation values and the class labels¶
#Get the dataframe of all the node values
activation_data = {'actual_class':train_labels}
for k in range(0,128):
activation_data[f"act_val_{k}"] = hidden_layer_activation[:,k]
activation_df = pd.DataFrame(activation_data)
activation_df.head(15).round(3).T
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| actual_class | 7.000 | 3.000 | 4.000 | 6.000 | 1.000 | 8.000 | 1.000 | 0.000 | 9.000 | 8.000 | 0.000 | 3.000 | 1.000 | 2.000 | 7.000 |
| act_val_0 | 0.000 | 3.711 | 0.187 | 0.000 | 0.000 | 3.741 | 0.000 | 0.000 | 0.193 | 2.811 | 3.123 | 2.113 | 0.000 | 0.059 | 0.000 |
| act_val_1 | 0.000 | 3.684 | 2.029 | 0.000 | 0.633 | 1.451 | 0.668 | 0.000 | 0.916 | 2.311 | 0.000 | 3.573 | 0.819 | 0.000 | 0.000 |
| act_val_2 | 0.000 | 0.000 | 0.000 | 4.265 | 1.466 | 0.000 | 1.405 | 1.154 | 0.000 | 0.690 | 0.000 | 2.623 | 1.461 | 0.000 | 0.000 |
| act_val_3 | 0.000 | 0.000 | 0.000 | 0.000 | 0.425 | 0.000 | 0.000 | 1.125 | 0.000 | 0.000 | 0.000 | 1.647 | 0.114 | 2.469 | 0.000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| act_val_123 | 1.016 | 1.090 | 0.405 | 0.000 | 2.669 | 0.000 | 3.985 | 0.000 | 1.364 | 0.589 | 0.000 | 2.577 | 3.305 | 0.000 | 4.729 |
| act_val_124 | 0.000 | 0.000 | 0.000 | 0.000 | 1.227 | 0.000 | 1.308 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.167 | 0.000 | 0.007 |
| act_val_125 | 0.217 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.915 | 0.000 | 0.000 | 0.000 | 0.000 |
| act_val_126 | 0.536 | 2.282 | 1.825 | 3.017 | 1.237 | 2.934 | 1.666 | 5.814 | 2.024 | 5.022 | 5.104 | 3.167 | 0.970 | 4.330 | 2.374 |
| act_val_127 | 0.000 | 1.119 | 0.000 | 0.000 | 0.000 | 1.196 | 0.000 | 0.024 | 0.000 | 2.759 | 1.427 | 3.034 | 0.000 | 0.000 | 0.000 |
129 rows × 15 columns
Visualize the activation values with boxplots¶
# To see how closely the hidden node activation values correlate with the class labels
# Let us use seaborn for the boxplots this time.
plt.figure(figsize=(16,10))
bplot = sns.boxplot(y='act_val_0', x='actual_class',
data=activation_df[['act_val_0','actual_class']],
width=0.5,
palette="colorblind")
Displaying The Range Of Activation Values For Each Class Labels¶
activation_df.groupby("actual_class")["act_val_0"].apply(lambda x: [round(min(x.tolist()),2),
round(max(x.tolist()),2)]).reset_index().rename(columns={"act_val_0": "range_of_act_values"})
| actual_class | range_of_act_values | |
|---|---|---|
| 0 | 0 | [0.0, 4.64] |
| 1 | 1 | [0.0, 4.84] |
| 2 | 2 | [0.0, 7.74] |
| 3 | 3 | [0.0, 6.48] |
| 4 | 4 | [0.0, 3.69] |
| 5 | 5 | [0.0, 3.39] |
| 6 | 6 | [0.0, 3.48] |
| 7 | 7 | [0.0, 6.1] |
| 8 | 8 | [0.0, 5.27] |
| 9 | 9 | [0.0, 4.54] |
Get Activation Values of the Pixel Values (784)¶
Create a dataframe with the pixel values and class labels¶
#Get the dataframe of all the pixel values
pixel_data = {'actual_class':train_labels}
for k in range(0,784):
pixel_data[f"pix_val_{k}"] = train_images[:,k]
pixel_df = pd.DataFrame(pixel_data)
pixel_df.head(15).round(3).T
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| actual_class | 7.0 | 3.0 | 4.0 | 6.0 | 1.0 | 8.0 | 1.0 | 0.0 | 9.0 | 8.0 | 0.0 | 3.0 | 1.0 | 2.0 | 7.0 |
| pix_val_0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| pix_val_1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| pix_val_2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| pix_val_3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| pix_val_779 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| pix_val_780 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| pix_val_781 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| pix_val_782 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| pix_val_783 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
785 rows × 15 columns
pixel_df.pix_val_77.value_counts()
pix_val_77
0.000000 54741
1.000000 24
0.996078 10
0.992157 9
0.050980 5
...
0.670588 1
0.858824 1
0.239216 1
0.839216 1
0.819608 1
Name: count, Length: 143, dtype: int64
pixel_df.pix_val_78.value_counts()
pix_val_78
0.000000 54871
1.000000 5
0.992157 4
0.960784 4
0.098039 3
...
0.047059 1
0.741176 1
0.568627 1
0.023529 1
0.501961 1
Name: count, Length: 92, dtype: int64
Use a scatter plot to visualize the predicive power of the pixel values at two fixed locations in the image, i.e. how well the pixel values at two fixed locations in the image "predict" the class labels.¶
We use a scatter plot to determine the correlation between the pix_val_77 and pix_val_78 values and the actual_class values.
plt.figure(figsize=(16, 10))
color = sns.color_palette("hls", 10)
sns.scatterplot(x="pix_val_77", y="pix_val_78", hue="actual_class", palette=color, data = pixel_df, legend="full")
plt.legend(loc='upper left');
PCA Feature Reduction / Model Optimization¶
Use PCA decomposition to reduce the number of features from 784 features to 2 features¶
# Separating out the features
features = [*pixel_data][1:] # ['pix_val_0', 'pix_val_1',...]
x = pixel_df.loc[:, features].values
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2'])
pixel_pca_df = pd.concat([principalDf, pixel_df[['actual_class']]], axis = 1)
pixel_pca_df.head().round(3)
| principal component 1 | principal component 2 | actual_class | |
|---|---|---|---|
| 0 | 0.725 | -2.433 | 7 |
| 1 | 0.473 | 1.005 | 3 |
| 2 | -0.094 | -3.010 | 4 |
| 3 | 0.221 | -0.725 | 6 |
| 4 | -3.680 | 2.086 | 1 |
pca.explained_variance_ratio_
array([0.097, 0.071], dtype=float32)
Use a scatter plot to visualize the predictive power of the two principal component values.¶
plt.figure(figsize=(16,10))
sns.scatterplot(
x="principal component 1", y="principal component 2",
hue="actual_class",
palette=sns.color_palette("hls", 10),
data=pixel_pca_df,
legend="full",
alpha=0.3
);
Use PCA decomposition to reduce the (activation) features from 128 (= num of hidden nodes) to 2¶
# Separating out the features
features = [*activation_data][1:] # ['act_val_0', 'act_val_1',...]
x = activation_df.loc[:, features].values
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2'])
principalDf.head().round(3)
| principal component 1 | principal component 2 | |
|---|---|---|
| 0 | 1.218 | -6.660 |
| 1 | -4.480 | -0.455 |
| 2 | -3.130 | -5.158 |
| 3 | 2.366 | 4.796 |
| 4 | -6.708 | 4.235 |
activation_pca_df = pd.concat([principalDf, activation_df[['actual_class']]], axis = 1)
activation_pca_df.head().round(3)
| principal component 1 | principal component 2 | actual_class | |
|---|---|---|---|
| 0 | 1.218 | -6.660 | 7 |
| 1 | -4.480 | -0.455 | 3 |
| 2 | -3.130 | -5.158 | 4 |
| 3 | 2.366 | 4.796 | 6 |
| 4 | -6.708 | 4.235 | 1 |
ev=pca.explained_variance_ratio_
ev
array([0.172, 0.117], dtype=float32)
print(f'The {len(ev)} principal components summed together {ev[0]:.3f} + {ev[1]:.3f} = {sum(ev):.3f} explained variance')
The 2 principal components summed together 0.172 + 0.117 = 0.289 explained variance
Use a scatter plot to visualize the predictive power of two principal component values.¶
plt.figure(figsize=(16,10))
sns.scatterplot(
x="principal component 1", y="principal component 2",
hue="actual_class",
palette=sns.color_palette("hls", 10),
data=activation_pca_df,
legend="full",
alpha=0.3
);
Use PCA decomposition to reduce the (activation) features from 128 (= num of hidden nodes) to 3¶
# Separating out the features
features = [*activation_data][1:] # ['act_val_0', 'act_val_1',...]
x = activation_df.loc[:, features].values
pca = PCA(n_components=3)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['pca-one', 'pca-two', 'pca-three'])
principalDf.head(10).round(3).T
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| pca-one | 1.218 | -4.480 | -3.130 | 2.366 | -6.708 | -0.023 | -7.246 | 13.832 | -1.936 | 1.288 |
| pca-two | -6.660 | -0.455 | -5.158 | 4.796 | 4.235 | 2.463 | 4.553 | -2.232 | -4.552 | 3.532 |
| pca-three | -0.856 | 9.390 | -1.377 | -4.507 | -1.830 | 2.874 | -1.018 | -0.319 | -1.956 | 4.567 |
ev=pca.explained_variance_ratio_
ev
array([0.172, 0.117, 0.099], dtype=float32)
print(f'The {len(ev)} principal components summed together {ev[0]:.3f} + {ev[1]:.3f} + {ev[2]:.3f} = {sum(ev):.3f} explained variance')
The 3 principal components summed together 0.172 + 0.117 + 0.099 = 0.389 explained variance
activation_pca_df = pd.concat([principalDf, activation_df[['actual_class']]], axis = 1)
activation_pca_df.head().round(3)
| pca-one | pca-two | pca-three | actual_class | |
|---|---|---|---|---|
| 0 | 1.218 | -6.660 | -0.856 | 7 |
| 1 | -4.480 | -0.455 | 9.390 | 3 |
| 2 | -3.130 | -5.158 | -1.377 | 4 |
| 3 | 2.366 | 4.796 | -4.507 | 6 |
| 4 | -6.708 | 4.235 | -1.830 | 1 |
Use t-Distributed Stochastic Neighbor Embedding (t-SNE) to reduce the (activation) features from 128 (= num of hidden nodes) to 2¶
t-Distributed Stochastic Neighbor Embedding (t-SNE) is another technique for dimensionality reduction and is particularly well suited for the visualization of high-dimensional datasets. This time we only use the first 10,000 training images (N=10000) since the technique is computationally expensive.
See http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf
activation_df.shape
(55000, 129)
N=55000
activation_df_subset = activation_df.iloc[:N].copy()
activation_df_subset.shape
(55000, 129)
data_subset = activation_df_subset[features].values
data_subset.shape
(55000, 128)
%%time
tsne = TSNE(n_components=2 # sorts nodes in ascending order,
,init='pca'
,learning_rate='auto'
,verbose=1
,perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(data_subset)
[t-SNE] Computing 121 nearest neighbors... [t-SNE] Indexed 55000 samples in 0.012s... [t-SNE] Computed neighbors for 55000 samples in 6.394s... [t-SNE] Computed conditional probabilities for sample 1000 / 55000 [t-SNE] Computed conditional probabilities for sample 2000 / 55000 [t-SNE] Computed conditional probabilities for sample 3000 / 55000 [t-SNE] Computed conditional probabilities for sample 4000 / 55000 [t-SNE] Computed conditional probabilities for sample 5000 / 55000 [t-SNE] Computed conditional probabilities for sample 6000 / 55000 [t-SNE] Computed conditional probabilities for sample 7000 / 55000 [t-SNE] Computed conditional probabilities for sample 8000 / 55000 [t-SNE] Computed conditional probabilities for sample 9000 / 55000 [t-SNE] Computed conditional probabilities for sample 10000 / 55000 [t-SNE] Computed conditional probabilities for sample 11000 / 55000 [t-SNE] Computed conditional probabilities for sample 12000 / 55000 [t-SNE] Computed conditional probabilities for sample 13000 / 55000 [t-SNE] Computed conditional probabilities for sample 14000 / 55000 [t-SNE] Computed conditional probabilities for sample 15000 / 55000 [t-SNE] Computed conditional probabilities for sample 16000 / 55000 [t-SNE] Computed conditional probabilities for sample 17000 / 55000 [t-SNE] Computed conditional probabilities for sample 18000 / 55000 [t-SNE] Computed conditional probabilities for sample 19000 / 55000 [t-SNE] Computed conditional probabilities for sample 20000 / 55000 [t-SNE] Computed conditional probabilities for sample 21000 / 55000 [t-SNE] Computed conditional probabilities for sample 22000 / 55000 [t-SNE] Computed conditional probabilities for sample 23000 / 55000 [t-SNE] Computed conditional probabilities for sample 24000 / 55000 [t-SNE] Computed conditional probabilities for sample 25000 / 55000 [t-SNE] Computed conditional probabilities for sample 26000 / 55000 [t-SNE] Computed conditional probabilities for sample 27000 / 55000 [t-SNE] Computed conditional probabilities for sample 28000 / 55000 [t-SNE] Computed conditional probabilities for sample 29000 / 55000 [t-SNE] Computed conditional probabilities for sample 30000 / 55000 [t-SNE] Computed conditional probabilities for sample 31000 / 55000 [t-SNE] Computed conditional probabilities for sample 32000 / 55000 [t-SNE] Computed conditional probabilities for sample 33000 / 55000 [t-SNE] Computed conditional probabilities for sample 34000 / 55000 [t-SNE] Computed conditional probabilities for sample 35000 / 55000 [t-SNE] Computed conditional probabilities for sample 36000 / 55000 [t-SNE] Computed conditional probabilities for sample 37000 / 55000 [t-SNE] Computed conditional probabilities for sample 38000 / 55000 [t-SNE] Computed conditional probabilities for sample 39000 / 55000 [t-SNE] Computed conditional probabilities for sample 40000 / 55000 [t-SNE] Computed conditional probabilities for sample 41000 / 55000 [t-SNE] Computed conditional probabilities for sample 42000 / 55000 [t-SNE] Computed conditional probabilities for sample 43000 / 55000 [t-SNE] Computed conditional probabilities for sample 44000 / 55000 [t-SNE] Computed conditional probabilities for sample 45000 / 55000 [t-SNE] Computed conditional probabilities for sample 46000 / 55000 [t-SNE] Computed conditional probabilities for sample 47000 / 55000 [t-SNE] Computed conditional probabilities for sample 48000 / 55000 [t-SNE] Computed conditional probabilities for sample 49000 / 55000 [t-SNE] Computed conditional probabilities for sample 50000 / 55000 [t-SNE] Computed conditional probabilities for sample 51000 / 55000 [t-SNE] Computed conditional probabilities for sample 52000 / 55000 [t-SNE] Computed conditional probabilities for sample 53000 / 55000 [t-SNE] Computed conditional probabilities for sample 54000 / 55000 [t-SNE] Computed conditional probabilities for sample 55000 / 55000 [t-SNE] Mean sigma: 2.328261 [t-SNE] KL divergence after 250 iterations with early exaggeration: 88.776878 [t-SNE] KL divergence after 300 iterations: 3.853568 CPU times: user 3min 11s, sys: 8.89 s, total: 3min 20s Wall time: 2min 26s
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
tsne_results
array([[0.341, 0.373],
[0.469, 0.732],
[0.611, 0.156],
...,
[0.684, 0.482],
[0.729, 0.874],
[0.226, 0.58 ]], dtype=float32)
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
plt.scatter(tsne_results[:,0],tsne_results[:,1], c=train_labels, s=10, cmap=cmap)
image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
dist = np.sum((position - image_positions) ** 2, axis=1)
if np.min(dist) > 0.02: # if far enough from other images
image_positions = np.r_[image_positions, [position]]
imagebox = mpl.offsetbox.AnnotationBbox(
mpl.offsetbox.OffsetImage(train_images[index].reshape(28,28), cmap="binary"),
position, bboxprops={"edgecolor": cmap(train_labels[index]), "lw": 2})
plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()
Experiment 4¶
- 28x28 images (784 pixels) reduced to 154 input nodes via PCA (n_components=0.95)
- hidden layer: 85 nodes
- output layer: 10 nodes
pca = PCA(n_components=0.95)
train_images_red = pca.fit_transform(train_images)
val_images_red = pca.transform(val_images)
test_images_red = pca.transform(test_images)
test_images_red.shape, train_images_red.shape, val_images_red.shape
((10000, 154), (55000, 154), (5000, 154))
#k.clear_session()
model = Sequential([
Dense(name = 'hidden_layer_1', units=85, activation='relu', input_shape=[154]),
Dense(name = 'output_layer', units = 10, activation ='softmax')
])
Build the DNN model¶
model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
hidden_layer_1 (Dense) (None, 85) 13175
output_layer (Dense) (None, 10) 860
=================================================================
Total params: 14,035
Trainable params: 14,035
Non-trainable params: 0
_________________________________________________________________
keras.utils.plot_model(model, "mnist_model.png", show_shapes=True)
Compile the DNN model¶
model.compile(optimizer='rmsprop',
loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])
Train the DNN model¶
history = model.fit(train_images_red
, train_labels
, epochs=15
, validation_data=(val_images_red, val_labels)
, callbacks=[tf.keras.callbacks.ModelCheckpoint("exp_4_optimized.h5",save_best_only=True,save_weights_only=False)]
)
Epoch 1/15 1719/1719 [==============================] - 1s 353us/step - loss: 0.3471 - accuracy: 0.9046 - val_loss: 0.1584 - val_accuracy: 0.9550 Epoch 2/15 1719/1719 [==============================] - 1s 320us/step - loss: 0.1367 - accuracy: 0.9594 - val_loss: 0.1133 - val_accuracy: 0.9676 Epoch 3/15 1719/1719 [==============================] - 1s 328us/step - loss: 0.0962 - accuracy: 0.9717 - val_loss: 0.0981 - val_accuracy: 0.9718 Epoch 4/15 1719/1719 [==============================] - 1s 339us/step - loss: 0.0736 - accuracy: 0.9789 - val_loss: 0.0877 - val_accuracy: 0.9736 Epoch 5/15 1719/1719 [==============================] - 1s 340us/step - loss: 0.0597 - accuracy: 0.9831 - val_loss: 0.0834 - val_accuracy: 0.9756 Epoch 6/15 1719/1719 [==============================] - 1s 345us/step - loss: 0.0489 - accuracy: 0.9862 - val_loss: 0.0815 - val_accuracy: 0.9780 Epoch 7/15 1719/1719 [==============================] - 1s 346us/step - loss: 0.0406 - accuracy: 0.9890 - val_loss: 0.0811 - val_accuracy: 0.9784 Epoch 8/15 1719/1719 [==============================] - 1s 341us/step - loss: 0.0343 - accuracy: 0.9908 - val_loss: 0.0798 - val_accuracy: 0.9792 Epoch 9/15 1719/1719 [==============================] - 1s 344us/step - loss: 0.0289 - accuracy: 0.9926 - val_loss: 0.0800 - val_accuracy: 0.9788 Epoch 10/15 1719/1719 [==============================] - 1s 329us/step - loss: 0.0248 - accuracy: 0.9936 - val_loss: 0.0804 - val_accuracy: 0.9816 Epoch 11/15 1719/1719 [==============================] - 1s 321us/step - loss: 0.0210 - accuracy: 0.9950 - val_loss: 0.0821 - val_accuracy: 0.9798 Epoch 12/15 1719/1719 [==============================] - 1s 323us/step - loss: 0.0176 - accuracy: 0.9960 - val_loss: 0.0866 - val_accuracy: 0.9790 Epoch 13/15 1719/1719 [==============================] - 1s 323us/step - loss: 0.0153 - accuracy: 0.9966 - val_loss: 0.0862 - val_accuracy: 0.9800 Epoch 14/15 1719/1719 [==============================] - 1s 323us/step - loss: 0.0131 - accuracy: 0.9976 - val_loss: 0.0867 - val_accuracy: 0.9798 Epoch 15/15 1719/1719 [==============================] - 1s 324us/step - loss: 0.0111 - accuracy: 0.9978 - val_loss: 0.0940 - val_accuracy: 0.9794
model = tf.keras.models.load_model("exp_4_optimized.h5")
Evaluate the DNN model¶
test_loss, test_acc = model.evaluate(test_images_red, test_labels)
313/313 [==============================] - 0s 266us/step - loss: 0.0824 - accuracy: 0.9766
print(f'test acc: {test_acc}, test loss: {test_loss}')
test acc: 0.9765999913215637, test loss: 0.08242236077785492
Plot performance metrics¶
history_dict = history.history
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.subplots(figsize=(16,12))
plt.tight_layout()
display_training_curves(history_dict['accuracy'], history_dict['val_accuracy'], 'accuracy', 211)
display_training_curves(history_dict['loss'], history_dict['val_loss'], 'loss', 212)
Making Predictions¶
# Get the predicted classes:
# model.predict_classes() is depricated in TensorFlow 2.7+
# pred_classes = model.predict(train_images)
# alternate method:
pred_train=model.predict(train_images_red)
pred_classes=np.argmax(pred_train, axis=1)
1719/1719 [==============================] - 0s 201us/step
print_validation_report(train_labels, pred_classes)
Classification Report
precision recall f1-score support
0 0.99 1.00 1.00 5444
1 1.00 1.00 1.00 6179
2 0.99 0.99 0.99 5470
3 1.00 0.99 0.99 5638
4 1.00 1.00 1.00 5307
5 1.00 1.00 1.00 4987
6 1.00 1.00 1.00 5417
7 0.99 1.00 0.99 5715
8 0.99 0.99 0.99 5389
9 1.00 0.99 0.99 5454
accuracy 0.99 55000
macro avg 0.99 0.99 0.99 55000
weighted avg 0.99 0.99 0.99 55000
Accuracy Score: 0.9946181818181818
Root Mean Square Error: 0.33780037138684577
Create the confusion matrix¶
conf_mx = tf.math.confusion_matrix(train_labels, pred_classes)
conf_mx;
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
df = pd.DataFrame(pred_train[0:20], columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
df.style.format("{:.2%}").background_gradient(cmap=cm)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00% | 0.00% | 0.00% | 8.19% | 0.00% | 0.00% | 0.00% | 91.81% | 0.00% | 0.00% |
| 1 | 0.00% | 0.00% | 0.00% | 99.98% | 0.00% | 0.00% | 0.00% | 0.00% | 0.01% | 0.01% |
| 2 | 0.00% | 0.00% | 0.00% | 0.00% | 83.03% | 0.00% | 0.00% | 0.05% | 0.01% | 16.92% |
| 3 | 0.00% | 0.00% | 0.01% | 0.00% | 0.00% | 0.00% | 99.99% | 0.00% | 0.00% | 0.00% |
| 4 | 0.00% | 99.95% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.05% | 0.00% |
| 5 | 0.00% | 0.00% | 0.01% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 99.99% | 0.00% |
| 6 | 0.00% | 99.88% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.12% | 0.00% |
| 7 | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 8 | 0.00% | 0.00% | 0.00% | 0.00% | 0.03% | 0.01% | 0.00% | 0.02% | 0.00% | 99.94% |
| 9 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 100.00% | 0.00% |
| 10 | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 11 | 0.00% | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 12 | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 13 | 0.00% | 0.00% | 92.31% | 7.65% | 0.00% | 0.03% | 0.00% | 0.01% | 0.00% | 0.00% |
| 14 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% |
| 15 | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 16 | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 17 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.05% | 0.01% | 99.94% |
| 18 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% |
| 19 | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
Visualize the confusion matrix¶
mtx = plot_confusion_matrix(train_labels,pred_classes)
cl_a, cl_b = 9, 4
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]
plt.figure(figsize=(8,8))
p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)
plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);
p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")
# plt.savefig("error_analysis_digits_plot_EXP1_valid")
plt.show()
Get Activation Values of the Hidden Nodes (128)¶
# Extracts the outputs of the 2 layers:
layer_outputs = [layer.output for layer in model.layers]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
print(f"There are {len(layer_outputs)} layers")
layer_outputs; # description of the layers
There are 2 layers
# Get the outputs of all the hidden nodes for each of the 60000 training images
activations = activation_model.predict(train_images_red)
hidden_layer_activation = activations[0]
output_layer_activations = activations[1]
hidden_layer_activation.shape # each of the 85 hidden nodes has one activation value per training image
1719/1719 [==============================] - 0s 254us/step
(55000, 85)
output_layer_activations.shape
(55000, 10)
print(f"The maximum activation value of the hidden nodes in the hidden layer is \
{hidden_layer_activation.max()}")
The maximum activation value of the hidden nodes in the hidden layer is 9.398319244384766
# Some stats about the output layer as an aside...
np.set_printoptions(suppress = True) # display probabilities as decimals and NOT in scientific notation
ouput_layer_activation = activations[1]
print(f"The output node has shape {ouput_layer_activation.shape}")
print(f"The output for the first image are {ouput_layer_activation[0].round(4)}")
print(f"The sum of the probabilities is (approximately) {ouput_layer_activation[0].sum()}")
The output node has shape (55000, 10) The output for the first image are [0. 0. 0. 0.082 0. 0. 0. 0.918 0. 0. ] The sum of the probabilities is (approximately) 1.0
Create a dataframe with the activation values and the class labels¶
#Get the dataframe of all the node values
activation_data = {'actual_class':train_labels}
for k in range(0,85):
activation_data[f"act_val_{k}"] = hidden_layer_activation[:,k]
activation_df = pd.DataFrame(activation_data)
activation_df.head(15).round(3).T
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| actual_class | 7.000 | 3.000 | 4.000 | 6.000 | 1.000 | 8.000 | 1.000 | 0.000 | 9.000 | 8.000 | 0.000 | 3.000 | 1.000 | 2.000 | 7.000 |
| act_val_0 | 0.000 | 0.629 | 0.000 | 0.000 | 0.000 | 0.422 | 1.384 | 0.212 | 0.000 | 0.000 | 0.000 | 0.000 | 0.351 | 2.546 | 1.246 |
| act_val_1 | 0.213 | 1.582 | 0.919 | 0.145 | 0.871 | 1.471 | 1.214 | 0.000 | 0.758 | 1.866 | 0.000 | 2.395 | 1.259 | 0.000 | 1.608 |
| act_val_2 | 0.877 | 1.153 | 0.000 | 0.178 | 0.000 | 0.664 | 0.295 | 0.000 | 0.000 | 0.000 | 0.254 | 0.594 | 0.000 | 2.288 | 1.363 |
| act_val_3 | 0.000 | 0.000 | 1.905 | 0.995 | 0.000 | 0.000 | 0.000 | 0.996 | 0.683 | 1.919 | 1.783 | 0.292 | 0.000 | 0.000 | 0.086 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| act_val_80 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.765 | 0.000 | 2.322 | 0.000 | 0.382 | 2.768 | 0.511 | 0.000 | 2.191 | 0.000 |
| act_val_81 | 2.018 | 3.580 | 1.706 | 0.000 | 0.948 | 0.000 | 0.988 | 0.000 | 2.274 | 0.745 | 0.437 | 1.219 | 0.976 | 0.879 | 1.930 |
| act_val_82 | 0.000 | 0.000 | 0.073 | 0.000 | 1.834 | 0.000 | 2.021 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.470 | 0.000 | 2.294 |
| act_val_83 | 0.000 | 0.000 | 0.000 | 1.118 | 4.013 | 0.000 | 3.589 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 3.767 | 0.000 | 0.997 |
| act_val_84 | 0.000 | 0.000 | 2.170 | 1.310 | 0.000 | 0.000 | 0.000 | 1.618 | 1.125 | 1.628 | 0.814 | 0.696 | 0.000 | 0.238 | 0.000 |
86 rows × 15 columns
Visualize the activation values with boxplots¶
# To see how closely the hidden node activation values correlate with the class labels
# Let us use seaborn for the boxplots this time.
plt.figure(figsize=(16,10))
bplot = sns.boxplot(y='act_val_0', x='actual_class',
data=activation_df[['act_val_0','actual_class']],
width=0.5,
palette="colorblind")
Displaying The Range Of Activation Values For Each Class Labels¶
activation_df.groupby("actual_class")["act_val_0"].apply(lambda x: [round(min(x.tolist()),2),
round(max(x.tolist()),2)]).reset_index().rename(columns={"act_val_0": "range_of_act_values"})
| actual_class | range_of_act_values | |
|---|---|---|
| 0 | 0 | [0.0, 5.56] |
| 1 | 1 | [0.0, 5.25] |
| 2 | 2 | [0.0, 5.23] |
| 3 | 3 | [0.0, 6.06] |
| 4 | 4 | [0.0, 3.04] |
| 5 | 5 | [0.0, 8.19] |
| 6 | 6 | [0.0, 6.19] |
| 7 | 7 | [0.0, 4.49] |
| 8 | 8 | [0.0, 5.89] |
| 9 | 9 | [0.0, 4.02] |
Get Activation Values of the Pixel Values (784)¶
Create a dataframe with the pixel values and class labels¶
#Get the dataframe of all the pixel values
pixel_data = {'actual_class':train_labels}
for k in range(0,154):
pixel_data[f"pix_val_{k}"] = train_images_red[:,k]
pixel_df = pd.DataFrame(pixel_data)
pixel_df.head(15).round(3).T
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| actual_class | 7.000 | 3.000 | 4.000 | 6.000 | 1.000 | 8.000 | 1.000 | 0.000 | 9.000 | 8.000 | 0.000 | 3.000 | 1.000 | 2.000 | 7.000 |
| pix_val_0 | 0.725 | 0.473 | -0.094 | 0.221 | -3.679 | 1.303 | -3.645 | 6.441 | -0.511 | 1.735 | 5.038 | 0.703 | -3.438 | 3.187 | -1.907 |
| pix_val_1 | -2.433 | 1.005 | -3.010 | -0.725 | 2.086 | 0.938 | 2.637 | 0.618 | -2.159 | 2.033 | 1.001 | 2.985 | 0.737 | 1.372 | -1.918 |
| pix_val_2 | 1.537 | 0.502 | 2.129 | -2.279 | -0.551 | -1.222 | -0.458 | -1.207 | 2.395 | 0.174 | 1.917 | 1.507 | 0.137 | 0.066 | 0.498 |
| pix_val_3 | -2.445 | 3.738 | 0.838 | -1.903 | -0.906 | 2.802 | -0.120 | 0.589 | -0.949 | 1.990 | -0.634 | 1.158 | -0.161 | 2.586 | -0.127 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| pix_val_149 | 0.242 | 0.161 | 0.033 | -0.102 | -0.082 | -0.114 | 0.094 | 0.166 | -0.262 | -0.170 | 0.033 | -0.044 | -0.011 | -0.184 | -0.051 |
| pix_val_150 | -0.115 | -0.105 | -0.006 | 0.040 | -0.061 | 0.050 | -0.130 | -0.191 | -0.010 | -0.439 | 0.139 | 0.018 | 0.115 | 0.028 | -0.050 |
| pix_val_151 | -0.367 | -0.218 | 0.013 | 0.048 | -0.117 | -0.117 | 0.116 | -0.141 | 0.022 | -0.228 | -0.135 | 0.150 | 0.002 | 0.108 | -0.111 |
| pix_val_152 | -0.063 | 0.148 | 0.125 | -0.071 | 0.007 | 0.083 | -0.046 | 0.120 | 0.017 | 0.283 | -0.077 | 0.002 | 0.080 | 0.006 | 0.256 |
| pix_val_153 | -0.264 | 0.007 | -0.035 | 0.028 | 0.021 | -0.087 | 0.055 | 0.071 | 0.183 | 0.041 | -0.132 | 0.142 | -0.173 | 0.105 | 0.268 |
155 rows × 15 columns
pixel_df.pix_val_77.value_counts()
pix_val_77
0.233526 2
0.222589 2
-0.106621 2
0.335311 2
0.372639 2
..
-0.097114 1
-0.040634 1
0.232502 1
0.125765 1
0.271186 1
Name: count, Length: 54958, dtype: int64
pixel_df.pix_val_78.value_counts()
pix_val_78
0.301367 2
-0.072439 2
0.184436 2
0.491759 2
-0.225153 2
..
-0.054373 1
-0.579701 1
0.121835 1
0.007295 1
-0.239521 1
Name: count, Length: 54980, dtype: int64
Use a scatter plot to visualize the predicive power of the pixel values at two fixed locations in the image, i.e. how well the pixel values at two fixed locations in the image "predict" the class labels.¶
We use a scatter plot to determine the correlation between the pix_val_77 and pix_val_78 values and the actual_class values.
plt.figure(figsize=(16, 10))
color = sns.color_palette("hls", 10)
sns.scatterplot(x="pix_val_77", y="pix_val_78", hue="actual_class", palette=color, data = pixel_df, legend="full")
plt.legend(loc='upper left');
PCA Feature Reduction / Model Optimization¶
Use PCA decomposition to reduce the number of features from 784 features to 2 features¶
# Separating out the features
features = [*pixel_data][1:] # ['pix_val_0', 'pix_val_1',...]
x = pixel_df.loc[:, features].values
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2'])
pixel_pca_df = pd.concat([principalDf, pixel_df[['actual_class']]], axis = 1)
pixel_pca_df.head().round(3)
| principal component 1 | principal component 2 | actual_class | |
|---|---|---|---|
| 0 | 0.725 | -2.433 | 7 |
| 1 | 0.473 | 1.005 | 3 |
| 2 | -0.094 | -3.010 | 4 |
| 3 | 0.221 | -0.725 | 6 |
| 4 | -3.680 | 2.086 | 1 |
pca.explained_variance_ratio_
array([0.102, 0.074], dtype=float32)
Use a scatter plot to visualize the predictive power of the two principal component values.¶
plt.figure(figsize=(16,10))
sns.scatterplot(
x="principal component 1", y="principal component 2",
hue="actual_class",
palette=sns.color_palette("hls", 10),
data=pixel_pca_df,
legend="full",
alpha=0.3
);
Use PCA decomposition to reduce the (activation) features from 128 (= num of hidden nodes) to 2¶
# Separating out the features
features = [*activation_data][1:] # ['act_val_0', 'act_val_1',...]
x = activation_df.loc[:, features].values
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2'])
principalDf.head().round(3)
| principal component 1 | principal component 2 | |
|---|---|---|
| 0 | -1.197 | 2.767 |
| 1 | 4.828 | -3.830 |
| 2 | -0.279 | 2.659 |
| 3 | -1.828 | 1.836 |
| 4 | 3.974 | 1.436 |
activation_pca_df = pd.concat([principalDf, activation_df[['actual_class']]], axis = 1)
activation_pca_df.head().round(3)
| principal component 1 | principal component 2 | actual_class | |
|---|---|---|---|
| 0 | -1.197 | 2.767 | 7 |
| 1 | 4.828 | -3.830 | 3 |
| 2 | -0.279 | 2.659 | 4 |
| 3 | -1.828 | 1.836 | 6 |
| 4 | 3.974 | 1.436 | 1 |
ev=pca.explained_variance_ratio_
ev
array([0.108, 0.087], dtype=float32)
print(f'The {len(ev)} principal components summed together {ev[0]:.3f} + {ev[1]:.3f} = {sum(ev):.3f} explained variance')
The 2 principal components summed together 0.108 + 0.087 = 0.196 explained variance
plt.figure(figsize=(16,10))
sns.scatterplot(
x="principal component 1", y="principal component 2",
hue="actual_class",
palette=sns.color_palette("hls", 10),
data=activation_pca_df,
legend="full",
alpha=0.3
);
# Separating out the features
features = [*activation_data][1:] # ['act_val_0', 'act_val_1',...]
x = activation_df.loc[:, features].values
pca = PCA(n_components=3)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['pca-one', 'pca-two', 'pca-three'])
principalDf.head(10).round(3).T
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| pca-one | -1.197 | 4.828 | -0.279 | -1.828 | 3.974 | 1.828 | 4.959 | -7.113 | -0.993 | 2.806 |
| pca-two | 2.767 | -3.830 | 2.659 | 1.836 | 1.436 | -2.776 | 1.040 | -3.955 | 1.950 | -4.261 |
| pca-three | 1.414 | 1.610 | 2.149 | -1.709 | -1.596 | -3.101 | -1.840 | -1.486 | 3.595 | -0.750 |
ev=pca.explained_variance_ratio_
ev
array([0.108, 0.087, 0.077], dtype=float32)
print(f'The {len(ev)} principal components summed together {ev[0]:.3f} + {ev[1]:.3f} = {sum(ev):.3f} explained variance')
The 3 principal components summed together 0.108 + 0.087 = 0.273 explained variance
activation_pca_df = pd.concat([principalDf, activation_df[['actual_class']]], axis = 1)
activation_pca_df.head().round(3)
| pca-one | pca-two | pca-three | actual_class | |
|---|---|---|---|---|
| 0 | -1.197 | 2.767 | 1.414 | 7 |
| 1 | 4.828 | -3.830 | 1.610 | 3 |
| 2 | -0.279 | 2.659 | 2.149 | 4 |
| 3 | -1.828 | 1.836 | -1.709 | 6 |
| 4 | 3.974 | 1.436 | -1.596 | 1 |
Use t-Distributed Stochastic Neighbor Embedding (t-SNE) to reduce the (activation) features from 128 (= num of hidden nodes) to 2¶
activation_df.shape
(55000, 86)
N=55000
activation_df_subset = activation_df.iloc[:N].copy()
activation_df_subset.shape
(55000, 86)
data_subset = activation_df_subset[features].values
data_subset.shape
(55000, 85)
%%time
tsne = TSNE(n_components=2 # sorts nodes in ascending order,
# tsne = TSNE(n_components=.95 # get nodes to explain 95% of the variance
,init='pca'
,learning_rate='auto'
,verbose=1
,perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(data_subset)
[t-SNE] Computing 121 nearest neighbors... [t-SNE] Indexed 55000 samples in 0.010s... [t-SNE] Computed neighbors for 55000 samples in 3.804s... [t-SNE] Computed conditional probabilities for sample 1000 / 55000 [t-SNE] Computed conditional probabilities for sample 2000 / 55000 [t-SNE] Computed conditional probabilities for sample 3000 / 55000 [t-SNE] Computed conditional probabilities for sample 4000 / 55000 [t-SNE] Computed conditional probabilities for sample 5000 / 55000 [t-SNE] Computed conditional probabilities for sample 6000 / 55000 [t-SNE] Computed conditional probabilities for sample 7000 / 55000 [t-SNE] Computed conditional probabilities for sample 8000 / 55000 [t-SNE] Computed conditional probabilities for sample 9000 / 55000 [t-SNE] Computed conditional probabilities for sample 10000 / 55000 [t-SNE] Computed conditional probabilities for sample 11000 / 55000 [t-SNE] Computed conditional probabilities for sample 12000 / 55000 [t-SNE] Computed conditional probabilities for sample 13000 / 55000 [t-SNE] Computed conditional probabilities for sample 14000 / 55000 [t-SNE] Computed conditional probabilities for sample 15000 / 55000 [t-SNE] Computed conditional probabilities for sample 16000 / 55000 [t-SNE] Computed conditional probabilities for sample 17000 / 55000 [t-SNE] Computed conditional probabilities for sample 18000 / 55000 [t-SNE] Computed conditional probabilities for sample 19000 / 55000 [t-SNE] Computed conditional probabilities for sample 20000 / 55000 [t-SNE] Computed conditional probabilities for sample 21000 / 55000 [t-SNE] Computed conditional probabilities for sample 22000 / 55000 [t-SNE] Computed conditional probabilities for sample 23000 / 55000 [t-SNE] Computed conditional probabilities for sample 24000 / 55000 [t-SNE] Computed conditional probabilities for sample 25000 / 55000 [t-SNE] Computed conditional probabilities for sample 26000 / 55000 [t-SNE] Computed conditional probabilities for sample 27000 / 55000 [t-SNE] Computed conditional probabilities for sample 28000 / 55000 [t-SNE] Computed conditional probabilities for sample 29000 / 55000 [t-SNE] Computed conditional probabilities for sample 30000 / 55000 [t-SNE] Computed conditional probabilities for sample 31000 / 55000 [t-SNE] Computed conditional probabilities for sample 32000 / 55000 [t-SNE] Computed conditional probabilities for sample 33000 / 55000 [t-SNE] Computed conditional probabilities for sample 34000 / 55000 [t-SNE] Computed conditional probabilities for sample 35000 / 55000 [t-SNE] Computed conditional probabilities for sample 36000 / 55000 [t-SNE] Computed conditional probabilities for sample 37000 / 55000 [t-SNE] Computed conditional probabilities for sample 38000 / 55000 [t-SNE] Computed conditional probabilities for sample 39000 / 55000 [t-SNE] Computed conditional probabilities for sample 40000 / 55000 [t-SNE] Computed conditional probabilities for sample 41000 / 55000 [t-SNE] Computed conditional probabilities for sample 42000 / 55000 [t-SNE] Computed conditional probabilities for sample 43000 / 55000 [t-SNE] Computed conditional probabilities for sample 44000 / 55000 [t-SNE] Computed conditional probabilities for sample 45000 / 55000 [t-SNE] Computed conditional probabilities for sample 46000 / 55000 [t-SNE] Computed conditional probabilities for sample 47000 / 55000 [t-SNE] Computed conditional probabilities for sample 48000 / 55000 [t-SNE] Computed conditional probabilities for sample 49000 / 55000 [t-SNE] Computed conditional probabilities for sample 50000 / 55000 [t-SNE] Computed conditional probabilities for sample 51000 / 55000 [t-SNE] Computed conditional probabilities for sample 52000 / 55000 [t-SNE] Computed conditional probabilities for sample 53000 / 55000 [t-SNE] Computed conditional probabilities for sample 54000 / 55000 [t-SNE] Computed conditional probabilities for sample 55000 / 55000 [t-SNE] Mean sigma: 1.889768 [t-SNE] KL divergence after 250 iterations with early exaggeration: 90.279701 [t-SNE] KL divergence after 300 iterations: 3.914978 CPU times: user 3min 2s, sys: 7.97 s, total: 3min 10s Wall time: 2min 29s
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
tsne_results
array([[0.67 , 0.902],
[0.711, 0.163],
[0.305, 0.746],
...,
[0.404, 0.374],
[0.152, 0.42 ],
[0.617, 0.562]], dtype=float32)
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
plt.scatter(tsne_results[:,0],tsne_results[:,1], c=train_labels, s=10, cmap=cmap)
image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
dist = np.sum((position - image_positions) ** 2, axis=1)
if np.min(dist) > 0.02: # if far enough from other images
image_positions = np.r_[image_positions, [position]]
imagebox = mpl.offsetbox.AnnotationBbox(
mpl.offsetbox.OffsetImage(train_images[index].reshape(28,28), cmap="binary"),
position, bboxprops={"edgecolor": cmap(train_labels[index]), "lw": 2})
plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()
Experiment 5¶
- 28x28 images (784 pixels) dimensionality reduction via random forests
- hidden layer: 85 nodes
- output layer: 10 nodes
Reducing dimensionality of the data with Random Forests.¶
We create a Random Forest Classifier (with the default 100 trees) and use it to find the relative importance of the 784 features (pixels) in the training set. We produce a heat map to visual the relative importance of the features (using code from Hands On Machine Learning by A. Geron). Finally, we select the 70 most important feature (pixels) from the training, validation and test images to test our 'best' model on.
rnd_clf = RandomForestClassifier(n_estimators=100, random_state=42)
rnd_clf.fit(train_images,train_labels)
RandomForestClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(random_state=42)
plt.figure(figsize = (12, 8))
plot_digit(rnd_clf.feature_importances_)
cbar = plt.colorbar(ticks=[rnd_clf.feature_importances_.min(), rnd_clf.feature_importances_.max()])
cbar.ax.set_yticklabels(['Not important', 'Very important'])
plt.show()
n = 70
imp_arr = rnd_clf.feature_importances_
idx = (-imp_arr).argsort()[:n] # get the indices of the 70 "most important" features/pixels
len(idx)
70
Create Training and Test Examples Leveraging 70 Pixels¶
# Create training, validation and test images using just the 70 pixel locations obtained above
train_images_sm = train_images[:,idx]
val_images_sm = val_images[:,idx]
test_images_sm = test_images[:,idx]
train_images_sm.shape, val_images.shape, test_images_sm.shape # the reduced images have dimension 70
((55000, 70), (5000, 784), (10000, 70))
# to convert an index n, 0<= n < 784
def pair(n,size):
x = n//size
y = n%size
return x,y
plt.figure(figsize = (12, 8))
plt.imshow(train_images[1].reshape(28,28),cmap='binary')
x, y = np.array([pair(k,28) for k in idx]).T
plt.scatter(x,y,color='red',s=20)
<matplotlib.collections.PathCollection at 0x31d99cb10>
model = Sequential([
Dense(name = 'hidden_layer_1', units=85, activation='relu', input_shape=(70,)),
Dense(name = 'output_layer', units = 10, activation ='softmax')
])
Build the DNN model¶
model.summary() # prints a summary representation of the odel
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
hidden_layer_1 (Dense) (None, 85) 6035
output_layer (Dense) (None, 10) 860
=================================================================
Total params: 6,895
Trainable params: 6,895
Non-trainable params: 0
_________________________________________________________________
keras.utils.plot_model(model, "mnist_model.png", show_shapes=True)
Compile the DNN model¶
model.compile(optimizer='rmsprop',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Train the DNN model¶
history = model.fit(train_images_sm
, train_labels
, epochs=30
, validation_data=(val_images_sm, val_labels)
, callbacks=[tf.keras.callbacks.ModelCheckpoint("exp_5_optimized.h5",save_best_only=True,save_weights_only=False)]
)
Epoch 1/30 1719/1719 [==============================] - 1s 338us/step - loss: 0.6280 - accuracy: 0.8141 - val_loss: 0.4373 - val_accuracy: 0.8750 Epoch 2/30 1719/1719 [==============================] - 1s 304us/step - loss: 0.4143 - accuracy: 0.8756 - val_loss: 0.3550 - val_accuracy: 0.9002 Epoch 3/30 1719/1719 [==============================] - 1s 304us/step - loss: 0.3448 - accuracy: 0.8967 - val_loss: 0.3083 - val_accuracy: 0.9120 Epoch 4/30 1719/1719 [==============================] - 1s 304us/step - loss: 0.3028 - accuracy: 0.9095 - val_loss: 0.2741 - val_accuracy: 0.9190 Epoch 5/30 1719/1719 [==============================] - 1s 305us/step - loss: 0.2751 - accuracy: 0.9175 - val_loss: 0.2630 - val_accuracy: 0.9230 Epoch 6/30 1719/1719 [==============================] - 1s 305us/step - loss: 0.2564 - accuracy: 0.9228 - val_loss: 0.2507 - val_accuracy: 0.9266 Epoch 7/30 1719/1719 [==============================] - 1s 304us/step - loss: 0.2423 - accuracy: 0.9275 - val_loss: 0.2387 - val_accuracy: 0.9326 Epoch 8/30 1719/1719 [==============================] - 1s 302us/step - loss: 0.2318 - accuracy: 0.9306 - val_loss: 0.2432 - val_accuracy: 0.9320 Epoch 9/30 1719/1719 [==============================] - 1s 306us/step - loss: 0.2236 - accuracy: 0.9323 - val_loss: 0.2301 - val_accuracy: 0.9332 Epoch 10/30 1719/1719 [==============================] - 1s 302us/step - loss: 0.2162 - accuracy: 0.9349 - val_loss: 0.2317 - val_accuracy: 0.9334 Epoch 11/30 1719/1719 [==============================] - 1s 306us/step - loss: 0.2093 - accuracy: 0.9374 - val_loss: 0.2198 - val_accuracy: 0.9376 Epoch 12/30 1719/1719 [==============================] - 1s 305us/step - loss: 0.2043 - accuracy: 0.9395 - val_loss: 0.2304 - val_accuracy: 0.9358 Epoch 13/30 1719/1719 [==============================] - 1s 305us/step - loss: 0.1989 - accuracy: 0.9409 - val_loss: 0.2209 - val_accuracy: 0.9380 Epoch 14/30 1719/1719 [==============================] - 1s 305us/step - loss: 0.1953 - accuracy: 0.9422 - val_loss: 0.2250 - val_accuracy: 0.9410 Epoch 15/30 1719/1719 [==============================] - 1s 304us/step - loss: 0.1919 - accuracy: 0.9426 - val_loss: 0.2255 - val_accuracy: 0.9358 Epoch 16/30 1719/1719 [==============================] - 1s 304us/step - loss: 0.1880 - accuracy: 0.9433 - val_loss: 0.2329 - val_accuracy: 0.9328 Epoch 17/30 1719/1719 [==============================] - 1s 302us/step - loss: 0.1850 - accuracy: 0.9452 - val_loss: 0.2215 - val_accuracy: 0.9410 Epoch 18/30 1719/1719 [==============================] - 1s 301us/step - loss: 0.1815 - accuracy: 0.9462 - val_loss: 0.2220 - val_accuracy: 0.9382 Epoch 19/30 1719/1719 [==============================] - 1s 301us/step - loss: 0.1798 - accuracy: 0.9469 - val_loss: 0.2201 - val_accuracy: 0.9384 Epoch 20/30 1719/1719 [==============================] - 1s 302us/step - loss: 0.1761 - accuracy: 0.9480 - val_loss: 0.2210 - val_accuracy: 0.9390 Epoch 21/30 1719/1719 [==============================] - 1s 303us/step - loss: 0.1750 - accuracy: 0.9487 - val_loss: 0.2206 - val_accuracy: 0.9394 Epoch 22/30 1719/1719 [==============================] - 1s 315us/step - loss: 0.1726 - accuracy: 0.9488 - val_loss: 0.2350 - val_accuracy: 0.9340 Epoch 23/30 1719/1719 [==============================] - 1s 308us/step - loss: 0.1707 - accuracy: 0.9497 - val_loss: 0.2178 - val_accuracy: 0.9416 Epoch 24/30 1719/1719 [==============================] - 1s 308us/step - loss: 0.1692 - accuracy: 0.9497 - val_loss: 0.2142 - val_accuracy: 0.9424 Epoch 25/30 1719/1719 [==============================] - 1s 305us/step - loss: 0.1673 - accuracy: 0.9497 - val_loss: 0.2277 - val_accuracy: 0.9388 Epoch 26/30 1719/1719 [==============================] - 1s 306us/step - loss: 0.1657 - accuracy: 0.9512 - val_loss: 0.2218 - val_accuracy: 0.9364 Epoch 27/30 1719/1719 [==============================] - 1s 306us/step - loss: 0.1646 - accuracy: 0.9516 - val_loss: 0.2252 - val_accuracy: 0.9392 Epoch 28/30 1719/1719 [==============================] - 1s 305us/step - loss: 0.1632 - accuracy: 0.9520 - val_loss: 0.2228 - val_accuracy: 0.9436 Epoch 29/30 1719/1719 [==============================] - 1s 306us/step - loss: 0.1620 - accuracy: 0.9520 - val_loss: 0.2253 - val_accuracy: 0.9370 Epoch 30/30 1719/1719 [==============================] - 1s 306us/step - loss: 0.1610 - accuracy: 0.9524 - val_loss: 0.2260 - val_accuracy: 0.9422
Evaluate the DNN model¶
model = tf.keras.models.load_model("exp_5_optimized.h5")
test_loss, test_acc = model.evaluate(test_images_sm, test_labels)
313/313 [==============================] - 0s 260us/step - loss: 0.2242 - accuracy: 0.9368
print(f'test acc: {test_acc}, test loss: {test_loss}')
test acc: 0.9368000030517578, test loss: 0.22416523098945618
Reviewing Performance¶
history_dict = history.history
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.subplots(figsize=(16,12))
plt.tight_layout()
display_training_curves(history_dict['accuracy'], history_dict['val_accuracy'], 'accuracy', 211)
display_training_curves(history_dict['loss'], history_dict['val_loss'], 'loss', 212)
Making Predictions¶
# Get the predicted classes:
# model.predict_classes() is depricated in TensorFlow 2.7+
# pred_classes = model.predict(train_images)
# alternate method:
pred_train=model.predict(train_images_sm)
pred_classes=np.argmax(pred_train, axis=1)
1719/1719 [==============================] - 0s 193us/step
print_validation_report(train_labels, pred_classes)
Classification Report
precision recall f1-score support
0 0.97 0.97 0.97 5444
1 0.98 0.98 0.98 6179
2 0.93 0.94 0.94 5470
3 0.95 0.95 0.95 5638
4 0.95 0.96 0.95 5307
5 0.96 0.92 0.94 4987
6 0.97 0.97 0.97 5417
7 0.96 0.96 0.96 5715
8 0.95 0.94 0.95 5389
9 0.93 0.94 0.93 5454
accuracy 0.95 55000
macro avg 0.95 0.95 0.95 55000
weighted avg 0.95 0.95 0.95 55000
Accuracy Score: 0.9544181818181818
Root Mean Square Error: 0.8948234970702831
Create the confusion matrix¶
conf_mx = tf.math.confusion_matrix(train_labels, pred_classes)
conf_mx;
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
df = pd.DataFrame(pred_train[0:20], columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
df.style.format("{:.2%}").background_gradient(cmap=cm)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00% | 0.00% | 39.53% | 1.81% | 0.02% | 0.00% | 0.00% | 50.60% | 0.01% | 8.04% |
| 1 | 0.00% | 0.00% | 0.00% | 99.65% | 0.00% | 0.00% | 0.00% | 0.00% | 0.01% | 0.34% |
| 2 | 0.00% | 0.00% | 0.00% | 0.00% | 4.25% | 0.00% | 0.00% | 0.04% | 0.00% | 95.71% |
| 3 | 0.00% | 0.00% | 0.00% | 0.00% | 0.12% | 0.00% | 99.87% | 0.00% | 0.00% | 0.00% |
| 4 | 0.00% | 99.91% | 0.02% | 0.01% | 0.00% | 0.02% | 0.00% | 0.00% | 0.04% | 0.00% |
| 5 | 0.00% | 0.00% | 0.00% | 0.00% | 0.02% | 0.00% | 0.00% | 0.00% | 99.97% | 0.00% |
| 6 | 0.00% | 99.90% | 0.03% | 0.00% | 0.00% | 0.00% | 0.00% | 0.02% | 0.05% | 0.00% |
| 7 | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 8 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.01% | 99.99% |
| 9 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 99.99% | 0.00% |
| 10 | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 11 | 0.00% | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 12 | 0.00% | 99.77% | 0.09% | 0.01% | 0.00% | 0.01% | 0.00% | 0.00% | 0.13% | 0.00% |
| 13 | 60.69% | 0.00% | 22.06% | 10.27% | 0.00% | 6.44% | 0.00% | 0.00% | 0.05% | 0.50% |
| 14 | 0.00% | 0.00% | 0.01% | 0.01% | 0.01% | 0.00% | 0.00% | 99.91% | 0.00% | 0.07% |
| 15 | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 16 | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| 17 | 0.00% | 0.00% | 1.93% | 1.68% | 0.00% | 0.00% | 0.00% | 3.45% | 0.00% | 92.93% |
| 18 | 0.01% | 0.00% | 0.05% | 0.00% | 0.01% | 0.00% | 99.92% | 0.00% | 0.02% | 0.00% |
| 19 | 96.07% | 0.00% | 3.92% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.01% |
Visualize the confusion matrix¶
mtx = plot_confusion_matrix(train_labels,pred_classes)
cl_a, cl_b = 5, 3
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]
plt.figure(figsize=(8,8))
p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)
plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);
p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")
# plt.savefig("error_analysis_digits_plot_EXP1_valid")
plt.show()
ACTIVATION EXTRACTION & SCATTERPLOT¶
- RF to 70 elements not as good compared to 128 NN and PCA
- accuracy after 30 epochs not as good
# Extracts the outputs of the 2 layers:
layer_outputs = [layer.output for layer in model.layers]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
print(f"There are {len(layer_outputs)} layers")
layer_outputs # description of the layers
There are 2 layers
[<KerasTensor: shape=(None, 85) dtype=float32 (created by layer 'hidden_layer_1')>, <KerasTensor: shape=(None, 10) dtype=float32 (created by layer 'output_layer')>]
# Get the outputs of all the hidden nodes for each of the 60000 training images
activations = activation_model.predict(train_images_sm)
hidden_layer_activation = activations[0]
output_layer_activations = activations[1]
hidden_layer_activation.shape # each of the 85 hidden nodes has one activation value per training image
1719/1719 [==============================] - 0s 227us/step
(55000, 85)
output_layer_activations.shape
(55000, 10)
print(f"The maximum activation value of the hidden nodes in the hidden layer is \
{hidden_layer_activation.max()}")
The maximum activation value of the hidden nodes in the hidden layer is 8.1651611328125
# Some stats about the output layer as an aside...
np.set_printoptions(suppress = True) # display probabilities as decimals and NOT in scientific notation
ouput_layer_activation = activations[1]
print(f"The output node has shape {ouput_layer_activation.shape}")
print(f"The output for the first image are {ouput_layer_activation[0].round(4)}")
print(f"The sum of the probabilities is (approximately) {ouput_layer_activation[0].sum()}")
The output node has shape (55000, 10) The output for the first image are [0. 0. 0.395 0.018 0. 0. 0. 0.506 0. 0.08 ] The sum of the probabilities is (approximately) 1.0
# To see how closely the hidden node activation values correlate with the class labels
# Let us use seaborn for the boxplots this time.
plt.figure(figsize=(16,10))
bplot = sns.boxplot(y='act_val_0', x='actual_class',
data=activation_df[['act_val_0','actual_class']],
width=0.5,
palette="colorblind")
activation_df.groupby("actual_class")["act_val_0"].apply(lambda x: [round(min(x.tolist()),2),
round(max(x.tolist()),2)]).reset_index().rename(columns={"act_val_0": "range_of_act_values"})
| actual_class | range_of_act_values | |
|---|---|---|
| 0 | 0 | [0.0, 5.56] |
| 1 | 1 | [0.0, 5.25] |
| 2 | 2 | [0.0, 5.23] |
| 3 | 3 | [0.0, 6.06] |
| 4 | 4 | [0.0, 3.04] |
| 5 | 5 | [0.0, 8.19] |
| 6 | 6 | [0.0, 6.19] |
| 7 | 7 | [0.0, 4.49] |
| 8 | 8 | [0.0, 5.89] |
| 9 | 9 | [0.0, 4.02] |
#Get the dataframe of all the pixel values
pixel_data = {'actual_class':train_labels}
for k in range(0,154):
pixel_data[f"pix_val_{k}"] = train_images_red[:,k]
pixel_df = pd.DataFrame(pixel_data)
pixel_df.head(15).round(3).T
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| actual_class | 7.000 | 3.000 | 4.000 | 6.000 | 1.000 | 8.000 | 1.000 | 0.000 | 9.000 | 8.000 | 0.000 | 3.000 | 1.000 | 2.000 | 7.000 |
| pix_val_0 | 0.725 | 0.473 | -0.094 | 0.221 | -3.679 | 1.303 | -3.645 | 6.441 | -0.511 | 1.735 | 5.038 | 0.703 | -3.438 | 3.187 | -1.907 |
| pix_val_1 | -2.433 | 1.005 | -3.010 | -0.725 | 2.086 | 0.938 | 2.637 | 0.618 | -2.159 | 2.033 | 1.001 | 2.985 | 0.737 | 1.372 | -1.918 |
| pix_val_2 | 1.537 | 0.502 | 2.129 | -2.279 | -0.551 | -1.222 | -0.458 | -1.207 | 2.395 | 0.174 | 1.917 | 1.507 | 0.137 | 0.066 | 0.498 |
| pix_val_3 | -2.445 | 3.738 | 0.838 | -1.903 | -0.906 | 2.802 | -0.120 | 0.589 | -0.949 | 1.990 | -0.634 | 1.158 | -0.161 | 2.586 | -0.127 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| pix_val_149 | 0.242 | 0.161 | 0.033 | -0.102 | -0.082 | -0.114 | 0.094 | 0.166 | -0.262 | -0.170 | 0.033 | -0.044 | -0.011 | -0.184 | -0.051 |
| pix_val_150 | -0.115 | -0.105 | -0.006 | 0.040 | -0.061 | 0.050 | -0.130 | -0.191 | -0.010 | -0.439 | 0.139 | 0.018 | 0.115 | 0.028 | -0.050 |
| pix_val_151 | -0.367 | -0.218 | 0.013 | 0.048 | -0.117 | -0.117 | 0.116 | -0.141 | 0.022 | -0.228 | -0.135 | 0.150 | 0.002 | 0.108 | -0.111 |
| pix_val_152 | -0.063 | 0.148 | 0.125 | -0.071 | 0.007 | 0.083 | -0.046 | 0.120 | 0.017 | 0.283 | -0.077 | 0.002 | 0.080 | 0.006 | 0.256 |
| pix_val_153 | -0.264 | 0.007 | -0.035 | 0.028 | 0.021 | -0.087 | 0.055 | 0.071 | 0.183 | 0.041 | -0.132 | 0.142 | -0.173 | 0.105 | 0.268 |
155 rows × 15 columns
plt.figure(figsize=(16, 10))
color = sns.color_palette("hls", 10)
sns.scatterplot(x="pix_val_77", y="pix_val_78", hue="actual_class", palette=color, data = pixel_df, legend="full")
plt.legend(loc='upper left');
# Time Stamp
current_time = datetime.datetime.now()
formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S")
# Print the formatted time
print("Last Run:", formatted_time)
Last Run: 2024-10-07 00:05:05